We can initially divide Azure Networking offerings into two big groups: Resources that can be deployed within Virtual Networks and those can’t be deployed in Virtual Networks.
In this article I’d like to mostly focus on the Virtual Networks and some of its features and its particularities, so the reader can reinforce their understanding about how networking works in the cloud.
Virtual Networks are a logical unit of network isolation inside Azure regions and are the Azure equivalent of a layer 3 network in a physical network.
Virtual Networks are defined with a name and one or more IP address spaces. Each address space can be sliced up in subnets.
Resources deployed in a Virtual Network are not reachable from outside the Virtual Network unless explicitly permitted. Many deployments customers can do from the portal by default, such as deploying a VM in a Virtual Network, do permit outside access by default.
Network isolation is accomplished by encapsulating Virtual Network-specific traffic inside a GRE tunnel with a specific key, unique for that Virtual Network. This does not provide traffic encryption inside Microsoft’s datacentres, it only provides isolation between Virtual Networks.
All VMs participating of the same Virtual Network are able to connect to each other, including to any gateways in the Virtual Network, by sending traffic inside a GRE tunnel with the same key.
The GRE key is not exposed, nor there is need for it to be, to users.
Layer 2 and reachability
Layer 2 traffic inside a Virtual Network is intercepted by the platform for the purpose of adding platform features such as traffic control, efficient routing, optimizations, etc. This becomes obvious when you check the ARP table on a VM deployed inside a Virtual Network. If you dump the ARP table you should see all ARP requests have resolved every Virtual Network IP address (including the gateway’s!) to the same MAC Address: 12:34:56:78:9A:BC
VMs in each subnet, even those in different address spaces, can reach by default every other resource in the Virtual Network without the need of a layer 3 device (or virtual device) to route packets between the different subnets inside the Virtual Network. This is part of the Azure’s optimizations at the platform layer and it’s done by the Platform’s Virtual Switch. From the VM’s perspective this is accomplished by sending traffic to the default gateway in the VM NIC’s configuration.
Reserved IP addresses
Every subnet in a Virtual Network has its first 3 IP addresses reserved for platform use, so e.g. if your subnet is 10.0.0.0/24, the first 3 IP addresses that are reserved will be: 10.0.0.1, 10.0.0.2 and 10.0.0.3
Routing inside Azure Virtual Networks
There is a simple rule of thumb you should follow when it comes to routing inside an Azure Virtual Network: Never make any changes to the VM’s guest OS routing table. The guest OS routing table would pretty much always have a default route pointing to the first usable IP address in the subnet, so e.g. if the VM is in the 10.0.0.0/24 subnet, the default route will point to 10.0.0.1 – this is enough for all your routing needs, as traffic going to 10.0.0.1 will be intercepted by the platform and routing and traffic control rules will be applied (among others). Removing the default route or making changes to the VM’s guest OS routing table will not accomplish any of your objectives and might make you lose connectivity to the VM.
TTL will still be decreased in 1 for traffic between subnets of the same Virtual Network.
DHCP is part of Azure’s Core SDN functionality, which means users should not make changes to DHCP configuration unless Azure’s documentation says so for a specific scenario.
Most if not all VM images are deployed with DHCP enabled on all their interfaces. Azure will deliver IP configuration to all of them, but will only deliver a default route on the VM’s primary interface. Sysadmins can define which interface is the primary when deploying the VMs.
Virtual Networks provide a DNS forwarder by default, but sysadmins can configure custom DNS servers for Virtual Networks or specific VMs. Please note DNS configuration is delivered by DHCP hence a reboot of your VM might be needed after you’ve changed DNS settings.
Azure’s default DNS forwarder has throttling limits, but those are not public and unlikely to be hit by most users.
Virtual Networks do not support jumbo frames (i.e. ethernet frames bigger than 1500 bytes).
Core SDN functionality
As per Microsoft’s public documentation, core Software Defined Networking (SDN) functionality is offered by a programmable Virtual Switch referred to as VFP (Virtual Filtering Platform).
VFP is implemented on the physical servers that host VMs, making it a scalable solution as it’s not centralised on a specific piece of hardware. VFP accomplishes functions such as NAT, stateful-like firewall, virtual networking encapsulation / decapsulation, routing rules and billing among others.
Think about it as a layer of programmable capabilities that sits between your VMs and the real network, so every inbound and outbound packet is evaluated and treated by VFP.
Advanced Network Services
Basic services such as DHCP and DNS work in a very similar way to legacy networks, but Microsoft has added some advanced services to the Virtual Switch.
Address Translation (or NAT) is done on the virtual switch and not centralised on an edge device as it is more common in legacy networks. This applies to SNAT (outbound traffic), DNAT (inbound traffic) and static NAT (1:1 mapping).
Network Security Groups
Network Security Groups (or NSGs) are Azure’s L4 network ACLs. They can be applied directly on your VM’s NIC or per subnet to keep things consistent, but they are always enforced on the virtual switch.
This removes many scaling issues as the processing of the ruleset is done locally and not on a centralised device.
One optimisation Microsoft has added to several of their Virtual Machine offerings is Accelerated Networking, a feature that reduces CPU usage, jitter and latency for all those VMs with the feature enabled, as well as increasing throughput for some specific VM SKUs.
We have learned that both NAT and NSGs are done on the virtual switch. That means every single packet going in or out a VM has to be inspected by the virtual switch (matched against NSG ruleset, any NAT rules, etc). This of course has a computational cost that is reflected in things like latency and jitter. Accelerated Networking uses technologies like SR-IOV to offload established connections to be handled completely to what Microsoft calls FPGA-based SmartNics, thus reducing latency down to microseconds in the same region. Mark Russinovich did a demo at Build 2018 where he showcases this and DPDK (not available yet): Video.
As you have now seen virtual networking is not so dissimilar from legacy networking, but there are some particularities that you need to be aware of.
Did I forget any features or particularities that you would like me to cover in the article? Let me know in the comments!