Orion Network Architecture

A fairly in-depth dive into a robust home network.

Motivation

What do we want our network to do?

  • Be scalable
    • We want to be able to lab basically anything, at any time, of nearly any (reasonable) complexity. We want to be able to create functionally infinite environments for our labbing, prototyping, developing, etc. We also want to manage how much these transient services can interact with the rest of our network.
  • Security
    • Segmenting our networks in this fashion allows us to have precise and tunable control over every network, every link, every packet that traverses any device. This becomes immensely valuable for concept such as:
      • Blocking baked-in DNS (DHCP clients not respecting our assigned DNS servers and just defaulting to 8.8.8.8 or something) by disallowing all outbound DNS that doesn’t originate at our recursive forwarders
      • Blocking ads at the DNS level
      • Prevent unintended phone-home services from functioning on specified devices
      • Dictate precisely which things given IoT devices can interact with
  • Speed
    • We want high speed linkages between the NAS, Hypervisor, and the WiFi 6E, preferably without having to implement policy-based routing on every server.
  • Preserve our dynamic DNS to ensure our records stay current even if weird things happen with the ISP’s DHCP
  • Dictate inbound WAN traffic Test

Subsections of

Chapter 1

Introduction

An overview of our implementation

Subsections of Introduction

Hardware

For the purposes of this documentation, we will primarily be concerned with the network and switching fabric itself. Although we will reference and detail many of our various devices, construction of those will largely be left as an exercise to the reader.

Routing

We will be implementing two nominal routers for this network. Firstly, our Edge router will be responsible for tasks related to the WAN. Additionally, our Internal router will be the gateway for our networks and serve as our primary network firewall.

Edge Router

What is an Edge Router?

Broadly speaking, an edge router is the device that accepts inbound traffic into the network, and directs outbound traffic to your ISP or peerings. Edge routers are well placed to handle QoS and mitigate bottlenecks from the core network.

For example, if we run Differentiated Services Code Point (DSCP) classification within the network, we can create a service policy at the Edge that queues traffic based on priorities we define. Perhaps we have a media server that streams to the WAN, but we want to set our personal desktop to have priority over that bulk stream traffic in the event of congestion.

Additionally, edge routers can serve as an entry point into the LAN via various VPN schemes, allowing a remote client to land on the internal network, but still be classified as “external” traffic on the interior firewall (as from the firewall’s perspective, that traffic originates from the Edge router).

Hardware Selection

Ideally, we’ll pick a device that can run our router operating system of choice (VyOS), that has a relatively low TDP, low-noise or no-noise, and at least two Gigabit Ethernet ports. Naturally, that gives us a wide set of options.

  • ZimaBoard - This is what I selected. I opted for the ZimaBoard 432 due to it’s relatively high PassMark score, 4GB of RAM, 6W TDP, and dual GbE NIC, and low cost (I paid $127.92 at checkout). Additionally, should I desire future upgrades, it does have a PCIe 2.0 4x slot on the side of it. One could very easily just slap an Intel NIC on as an expansion.

Internal Router

What is our Internal Router?

The internal, or core, router serves multiple roles in our network. Not only is it the default gateway for virtually all of our network segments, but it also handles the bulk of our firewall rules, handles internal DHCP, maintains our dynamic DNS, and a variety of monitoring. The core router sits directly in the middle of our network, and one of our main goals is to direct virtually all traffic through it. Running all of these processes can potentially consume a lot of resources (CPU/RAM), which is another reason why we keep it seperate from the duties of the Edge Router.

Hardware Selection

Switching

What do we need in a switch?

Our switch should be a device that is managed, allows 802.1q tagging (VLAN tagging), and link aggregation. Some additional features we might be interested in are DSCP based QoS and LACP (Link Aggregation Control Protocol) support.

Hardware Selection
  • The TP-Link TL-SG1024DE does not support LACP, but does handle a standard LAG fine. For my purposes, this is satisfactory.

Wireless

More details about Wireless later to come.

Hardware Selection

Additional Hardware

As stated above, this section is purely examples of what I have. These devices are not mandatory but are listed for posterity.

Network Attached Storage

I have a whiteboxed Linux system that handles a ZFS RaidZ2 (raid 6) 6x8tb array for me. I’ve got a 4x1GbE Intel NIC in it, and having a 4-port LAG to the switch is a priority. This device doesn’t run any services beyond NFS/SMB, but ZFS is fairly RAM hungry, so it runs an older Ryzen 5 I have.

Hypervisor

Another whiteboxed server, this time a VMWare ESXi host with a small Intel Xeon in it. This is the primary host for many internal services, media services, and lab VMs. It also has a 4x1GbE, but regretably doesn’t support LACP, so we’ll be using a standard LAG here.

NEMS

NEMS is a Raspberry Pi-based Nagios server, and will be a primary monitor for our entire architecture. I like NEMS because it’s compatible with both the Pi and the little 5" TFT screen that I have. This means the server closet has a physical monitor with visible alarms on it, should I happen to be in there breaking things. I could easily just virtualize Nagios on the hypervisor, but I like it being standalone that way it can monitor the hypervisor also.

PiHole

PiHole is a DNS sinkhole that will be blocking ads on our network at the DNS level. Both of our internal name servers will be forwarding requests to the this. This doesn’t have to be a physical device, you could easily just virtualize it. I choose to keep it physical for the same reasons I keep NEMS physical - I don’t want the status of the hypervisor to affect all outbound DNS. Note that you could have two of these for redundancy, or even virtualize one of them. I currently just run a single one.

Secondary Domain Name Server

My primary DNS (NS1) is a virtual machine on the hypervisor. The secondary (NS2) is also on a Raspberry Pi running the Pi-flavor of Ubuntu. Both of these servers run bind9 and the master (NS1) has an allowance to transfer zones to NS2. Both of these could be virtual, but again, redundancy.

Chapter 2

Routing

The complicated part. Let’s get started and break down how we’re segmenting this.

Subsections of Routing

Overview

Chapter 3

Networks

We’re going to be covering about a dozen different segments here.

Be aware that the VLANs and subnets I display here are purely an example, feel free to use whatever RFC 1918 addresses you want.

Network VLAN Default Subnets
Generic Users 1100 10.10.0.0/24
Guest Users 1200 10.20.0.0/24
Management 1990 10.99.0.0/24
Internal Services 1300 10.30.0.0/25
DNS 1900 10.90.0.0/24
WAN Exposure 1800 10.80.0.0/24
IoT 1700 10.70.0.0/16
Security Systems 1750 10.75.0.0/16
Quarantine 1690 10.69.0.0/24
Printer 1420 10.42.0.0/24
Internal/External 7001 172.21.1.0/29
Router Loopbacks N/A 10.100.0.0/16
Dead End 666 N/A

Subsections of Networks

Users

VLAN Default Subnet(s)
1100 10.10.0.0/24

This segment is for stuff like my desktop, laptop dock, phone, trusted IoT devices, etc.

Guest Users

VLAN Default Subnet(s)
1200 10.20.0.0/24

Ideally any endpoint devices that I don’t own go here. This should be the default home for newly connected WiFi devices. This VLAN will have access to the DNS, the media server front ends, and any local gameservers. They will not have the ability to access any management interfaces, the management segment, or the SSH of any local device.

Management

VLAN Default Subnet(s)
1990 10.99.0.0/24

Here’s where the NAS, the ESXi, routers and switches all exist, at least for management purposes. This is primarily meant to be a single network segment because it allows us to designate a port on the switch specifically for recovery purposes. By having access to virtually everything via the same VLAN, I can just plug a computer straight into the port tagged for this network and repair whatever outage I’ve caused.

Although this network is not originally intended to be a big throughput VLAN, I am using this segment for all direct connections between the baremetal ESXi and NAS. The NAS exists natively here, but do be aware that I am putting all media devices on a different segment. This means that the ESXi host itself can get to the NAS on the same L2 domain, but all services still need to pass through the firewall. Depending on how you want to mount the NAS into VMs/containers/services, this allows you to dictate a firewall layer on demand.

Internal Services

VLAN Default Subnet(s)
1300 10.30.0.0/16

Big data happens here. This segment is the primary motivator for designating 3 LAGs on the network. Many VMs are going to be native on this VLAN, virtually all the media related VMs/containers. This is the default landing zone for new services I spin up (that don’t require quarantining anyways).

DNS

VLAN Default Subnet(s)
1900 10.90.0.0/16

All of our DNS servers exist here, including the PiHole. Consider the following IP assignment:

Server IP/CIDR Gateway
NS1 10.90.10.10/28 10.90.10.14
NS2 10.90.20.20/28 10.90.20.30
PiHole 10.90.10.5/28 10.90.10.14

Personally, I like NS1’s IP ending in 10.10, and NS2’s IP ending in 20.20. “But wait!” you might say, “That’s two gateways on the same VLAN!” Yes, and we’re going to handle that with virtual IPs. Not to get terribly ahead of ourselves, but in OPNsense, Interfaces > Virtual IPs > Settings > Add contains the solution to this. We’ll go into greater detail in the router setup section.

DMZ/WAN Exposure

VLAN Default Subnet(s)
1800 10.80.0.0/24

This segment is really for two types of devices: the reverse proxy that is exposed to the WAN, and any game servers/VMs. It should be fairly obvious why these are segmented and they will be somewhat restricted in what they can communicate with on the internal network. A small reminder: because our Edge router is handling the NAT for us, Destination NAT (DNAT, aka Port Forwarding) for these devices will be handled at the Edge.

IoT

VLAN Default Subnet(s)
1700 10.70.0.0/16

Okay this is the first potentially complicated segment, because depending on your needs, you might actually want it to be three different segments with varying levels of controls and severity:

  • Highest severity - This is going to be all the devices that you don’t understand why they have WiFi/internet access but maybe still want to let them do things. This is your refridgerator, washing machine, coffee maker, etc. This segment would be allowed access to the internet and that’s it. They wouldn’t even get access to the internal DNS. They can’t interact with other endpoints locally, they can only reach out and call home.
  • Medium severity - These are devices that desire moderate communication with other devices on the local network. I don’t have any examples of these, but maybe your thermostat needs this level.
  • Low severity - This is where your devices that are fairly trusted but still IoT go. This could be things you build yourself with esp8266 boards, a Hue bridge, an OctoPi server for your 3D printer. They have no outbound SSH ability, but can access the internet and local clients can reach them.

Security Systems

VLAN Default Subnet(s)
1750 10.75.0.0/16

In my network, I do not want security systems to be able to access the WAN, or be accessed from external. They will need access to various appliances, such as the NAS and frontend servers. The goal will be to keep them isolated from all else. If your cameras require access to the WAN, treat them very similar to the Mid or High severity IoT devices networks.

Quarantine

VLAN Default Subnet(s)
1690 10.69.0.0/24

The step-child of networks. This is primarily for projects where I’m standing up VMs that I do not want to touch anything else, but desire some level of network connectivity for reasons. This firewall profile will be the most mutable in my network, as rules will change based on my immediate project goals. Sometimes, this segment might only be allowed to talk to itself, sometimes it might have full network access, or anywhere in between.

Printer

VLAN Default Subnet(s)
1420 10.42.0.0/24

I hate printers. They get their own network where I can lock them down if I need to. Currently this is unused in my network as the printer I have doesn’t do anything I hate. It gets to sit on the WiFi like a real boy.

Routers

VLAN Default Subnet(s)
7001 172.21.1.0/29
N/A 10.100.0.0/16

Wait two segments? And one doesn’t have a VLAN?

Lets talk about the first subnet first, 172.21.1.0/29. Why is it numbered so differently than the other networks? Because it is the network segment that the two routers are going to form an adjacency in. No device other than these two routers needs to talk to these IPs. For monitoring purposes, NEMS is also allowed to ping these. For example, in our scheme, consider the following:

Router Interface Address
Edge eth1 172.21.1.1/29
Internal igc0/eth0 172.21.1.2/29
Edge Loopback lo0 10.100.1.0/32
Internal Loopback lo1 10.100.2.0/32

The first two entries are the interfaces on each router that will be active in the OSPF process. Don’t sweat this too much, just be aware that they send OSPF hellos to eachother from those interfaces, and listen for Link State Advertisements (LSAs).

The second two entries are the loopback interfaces for each router. This is going to be used as the Router ID for the OSPF process. Note that they are /32 addresses because they are purely routed interfaces, they do not have a Layer 2 component. Don’t sweat this either. You don’t have to understand OSPF to follow the instructions later.

Dead End

VLAN Default Subnet(s)
666 N/A

This isn’t a real network, but simply the name we give to our administratively disconnected VLAN. More on this in the Switching section. There is no gateway for this segment.

Chapter 4

Firewall

Lorem Ipsum.

OSPF

Chapter 5

Switching

Lets take a look at how we’re configuring our Layer 2.

Subsections of Switching

Overview

Ports

Protocols

Chapter 6

Services

Lorem Ipsum.

Subsections of Services

DNS

Title goes here

Test

Dynamic DNS