AWS to Azure S2S VPN

This article will show you how to use the native Azure and AWS VPN solutions create a full VPN mesh between Azure and AWS with dual gateways on both sides and active-active connectivity across the board.

As far as I know this is the first time this kind of setup is publicly documented, so enjoy the first row seat!

A bit of context…

Establishing a site to site (S2S) VPN between AWS and Azure was a less than ideal process as it required at least one of the sides to use a 3rd party network virtual appliance (NVA) such as Windows RRAS, a virtual firewall from one of the available vendors or a Linux box running VPN software such as StrongSwan.

Not only the setup process was cumbersome, but it also implied you had to build your own HA capabilities at whichever side you decided to deploy your 3rd party NVA and had to build complex setups for automated failover with minimal downtime.

Why was it like that?

The reason why you had to use a 3rd party NVA on either side is simple: AWS and Azure’s native VPN solutions were not compatible.

AWS chose to only support IKEv1 for their native S2S VPN solution, whilst Azure chose to move to IKEv2 only for theirs. This basically created a situation were their VPN solutions couldn’t obviously agree on which IKE version to use when establishing a S2S tunnel.

As a side note, Azure has a VPN gateway SKU that supports IKEv1. That’s the Basic SKU and unsupported for production environments, it does however have other interoperability issues with AWS VPN making the connection also impossible.

What has changed?

AWS has announced they now support IKEv2, which is great news for everyone, but mainly for their customers and those who have to interact with AWS VPCs.

Does that mean that a VPN between Azure and AWS is now possible? Yes, but with some caveats.

What can be accomplished now?

Right now you can configure tunnels between AWS and Azure that take advantage of the HA possibilities on both sides, with just a few caveats:

  1. You can’t use BGP
  2. You need to create two VPN connections on the AWS side to achieve active-active across on both sides, otherwise only the AWS side is active-active (using both VGWs) while sending all traffic to one of the VPN GWs on Azure.

Scenario

We’re going to configure a site to site connection between Azure and AWS. Unfortunately we can’t use BGP at the moment because AWS forces you to use APIPA addresses for the tunnel’s inside IP, which is also the IP where BGP listens on; and Azure forces you to use the last available IP address on the gateway subnet as the BGP listener. These two settings are not compatible.

We will still use dual tunnels on the AWS side and active-active on the Azure side. I’ll initially build active-standby from Azure (i.e. only one VPN GW Public IP) and test connectivity. This is the scenario (please ignore the IP addresses):

Multiple On-Premises VPN
About highly available connections, Microsoft.

Only then I’ll add the second VPN GW on the Azure side to accomplish this scenario:

Dual Redundancy
About highly available connections, Microsoft.

AWS Configuration

VPC IP addressing172.31.0.0/16
Regionus-east-2
VGW IP 1
18.220.213.254
VGW IP 252.15.136.135
CGW52.174.95.131

Azure configuration

VNet IP addressing10.0.0.0/16
RegionWest Europe
VPN GW Public IP52.174.95.131
Local Network Gateway 118.220.213.254
Local Network Gateway 252.15.136.135

Step by step setup

Bring up the first tunnel

  1. Create the Azure VNet and the AWS VPC as per the above settings
  2. Create the Azure VPN GW (do this first as it takes a while to get created)
  3. Create the AWS VGW and attach it to your VPC
  4. Create the tunnel in the AWS side as per the above settings, no BGP and leave inside tunnel and PSK to be automatically created
  5. Create the new connection in Azure pointing to the first VGW, which you’ll need to create as Local Network Gateway. Use the right PSK from AWS’s configuration. At this point your first tunnel will come up:

Bring up the second tunnel

  1. Create a second connection on the Azure side, for which you will have to create a new Local Network Gateway with the 2nd VGW’s IP address and the same IP address space.

And that’s it, you should now see both tunnels up. Yeah, it’s that easy!

Routing

Now that the tunnel is UP, the Azure VNet will effectively learn the routes to the networks at the other side of the tunnels. You can check those routes in the effective routes for the NIC of any VM deployed in the VNet.


The AWS side requires you to enable route propagation on the VPC’s route table first:

Once enabled you should immediately see the route in the route table:

Testing

I have deployed a VM on each side of the tunnel and I’ll do some testing. I’ll start just with ICMP, but remember ICMP is stateless you it plays quite well with asymmetric routing.

Next up we’ll run some iPerf tests to show TCP connectivity. As there are not stateful devices receiving traffic on different interfaces, the potential asymmetry on this setup should not be a problem.

AWS VM IP Address172.31.33.246
AWS VM sizet2.micro
Azure VM IP address10.0.1.4
Azure VM sizeA0

ICMP test

We’ll ping the AWS VM from the Azure VM, capture the output of the ping command on the Azure side and the output of a tcpdump (only the on-screen output) on the AWS side.

Azure

pjperez@vpntest-Azure:~$ ping 172.31.33.246
PING 172.31.33.246 (172.31.33.246) 56(84) bytes of data.
64 bytes from 172.31.33.246: icmp_seq=1 ttl=254 time=109 ms
64 bytes from 172.31.33.246: icmp_seq=2 ttl=254 time=109 ms
64 bytes from 172.31.33.246: icmp_seq=3 ttl=254 time=110 ms
64 bytes from 172.31.33.246: icmp_seq=4 ttl=254 time=108 ms
64 bytes from 172.31.33.246: icmp_seq=5 ttl=254 time=110 ms
^C
--- 172.31.33.246 ping statistics ---
5 packets transmitted, 5 received, 0% packet loss, time 4002ms
rtt min/avg/max/mdev = 108.781/109.743/110.653/0.611 ms

AWS

[root@ip-172-31-33-246 ~]# tcpdump host 10.0.1.4
tcpdump: verbose output suppressed, use -v or -vv for full protocol decode
listening on eth0, link-type EN10MB (Ethernet), capture size 262144 bytes
12:10:12.944619 IP ip-10-0-1-4.us-east-2.compute.internal > ip-172-31-33-246.us-east-2.compute.internal: ICMP echo request, id 25440, seq 1, length 64
12:10:12.944647 IP ip-172-31-33-246.us-east-2.compute.internal > ip-10-0-1-4.us-east-2.compute.internal: ICMP echo reply, id 25440, seq 1, length 64
12:10:13.945001 IP ip-10-0-1-4.us-east-2.compute.internal > ip-172-31-33-246.us-east-2.compute.internal: ICMP echo request, id 25440, seq 2, length 64
12:10:13.945029 IP ip-172-31-33-246.us-east-2.compute.internal > ip-10-0-1-4.us-east-2.compute.internal: ICMP echo reply, id 25440, seq 2, length 64
12:10:14.946326 IP ip-10-0-1-4.us-east-2.compute.internal > ip-172-31-33-246.us-east-2.compute.internal: ICMP echo request, id 25440, seq 3, length 64
12:10:14.946350 IP ip-172-31-33-246.us-east-2.compute.internal > ip-10-0-1-4.us-east-2.compute.internal: ICMP echo reply, id 25440, seq 3, length 64
12:10:15.945779 IP ip-10-0-1-4.us-east-2.compute.internal > ip-172-31-33-246.us-east-2.compute.internal: ICMP echo request, id 25440, seq 4, length 64
12:10:15.945807 IP ip-172-31-33-246.us-east-2.compute.internal > ip-10-0-1-4.us-east-2.compute.internal: ICMP echo reply, id 25440, seq 4, length 64
12:10:16.946711 IP ip-10-0-1-4.us-east-2.compute.internal > ip-172-31-33-246.us-east-2.compute.internal: ICMP echo request, id 25440, seq 5, length 64
12:10:16.946738 IP ip-172-31-33-246.us-east-2.compute.internal > ip-10-0-1-4.us-east-2.compute.internal: ICMP echo reply, id 25440, seq 5, length 64
^C
10 packets captured
10 packets received by filter
0 packets dropped by kernel

As you can see there’s no packet loss and the latency is more or less expected for a cross-oceanic connection.

TCP (iPerf3) test

In this case we’ll start an iPerf3 server on the AWS side and push traffic with 8 parallel threads from the Azure side. I will only share the final result from each side for brevity.

Azure (Client)


[ ID] Interval Transfer Bandwidth Retr
[ 4] 0.00-10.00 sec 14.3 MBytes 12.0 Mbits/sec 3 sender
[ 4] 0.00-10.00 sec 12.6 MBytes 10.6 Mbits/sec receiver
[ 6] 0.00-10.00 sec 21.9 MBytes 18.3 Mbits/sec 6 sender
[ 6] 0.00-10.00 sec 19.0 MBytes 15.9 Mbits/sec receiver
[ 8] 0.00-10.00 sec 12.9 MBytes 10.8 Mbits/sec 3 sender
[ 8] 0.00-10.00 sec 11.2 MBytes 9.41 Mbits/sec receiver
[ 10] 0.00-10.00 sec 14.9 MBytes 12.5 Mbits/sec 8 sender
[ 10] 0.00-10.00 sec 13.1 MBytes 11.0 Mbits/sec receiver
[ 12] 0.00-10.00 sec 11.0 MBytes 9.21 Mbits/sec 6 sender
[ 12] 0.00-10.00 sec 9.38 MBytes 7.87 Mbits/sec receiver
[ 14] 0.00-10.00 sec 15.2 MBytes 12.7 Mbits/sec 5 sender
[ 14] 0.00-10.00 sec 13.2 MBytes 11.1 Mbits/sec receiver
[ 16] 0.00-10.00 sec 16.0 MBytes 13.4 Mbits/sec 8 sender
[ 16] 0.00-10.00 sec 13.7 MBytes 11.5 Mbits/sec receiver
[ 18] 0.00-10.00 sec 24.3 MBytes 20.4 Mbits/sec 2 sender
[ 18] 0.00-10.00 sec 21.4 MBytes 17.9 Mbits/sec receiver
[SUM] 0.00-10.00 sec 130 MBytes 109 Mbits/sec 41 sender
[SUM] 0.00-10.00 sec 114 MBytes 95.2 Mbits/sec receiver
iperf Done.

You’ll see there is a number of retransmissions and an overall performance of roughly 100Mbps. This is restricted by VM sizes on both sides as I’m using A0 on Azure (the smallest, slowest, most restricted SKU) and t2.micro on the AWS side.

AWS (Server)

[ ID] Interval           Transfer     Bandwidth
[ 5] 0.00-10.13 sec 0.00 Bytes 0.00 bits/sec sender
[ 5] 0.00-10.13 sec 12.6 MBytes 10.4 Mbits/sec receiver
[ 7] 0.00-10.13 sec 0.00 Bytes 0.00 bits/sec sender
[ 7] 0.00-10.13 sec 19.0 MBytes 15.7 Mbits/sec receiver
[ 9] 0.00-10.13 sec 0.00 Bytes 0.00 bits/sec sender
[ 9] 0.00-10.13 sec 11.2 MBytes 9.29 Mbits/sec receiver
[ 11] 0.00-10.13 sec 0.00 Bytes 0.00 bits/sec sender
[ 11] 0.00-10.13 sec 13.1 MBytes 10.9 Mbits/sec receiver
[ 13] 0.00-10.13 sec 0.00 Bytes 0.00 bits/sec sender
[ 13] 0.00-10.13 sec 9.38 MBytes 7.77 Mbits/sec receiver
[ 15] 0.00-10.13 sec 0.00 Bytes 0.00 bits/sec sender
[ 15] 0.00-10.13 sec 13.2 MBytes 10.9 Mbits/sec receiver
[ 17] 0.00-10.13 sec 0.00 Bytes 0.00 bits/sec sender
[ 17] 0.00-10.13 sec 13.7 MBytes 11.3 Mbits/sec receiver
[ 19] 0.00-10.13 sec 0.00 Bytes 0.00 bits/sec sender
[ 19] 0.00-10.13 sec 21.4 MBytes 17.7 Mbits/sec receiver
[SUM] 0.00-10.13 sec 0.00 Bytes 0.00 bits/sec sender
[SUM] 0.00-10.13 sec 114 MBytes 94.0 Mbits/sec receiver
Server listening on 5201

As you can see, this has been quite easy and works like a charm.

You don’t need anything else to have a reliable and performant VPN connection between Azure and AWS, but if you’d like to use active-active on the Azure side too please keep reading…

Steps to add active-active on the Azure side

  1. Make sure you have created your Azure VPN Gateway as active-active, otherwise make the change as per the instructions.
  2. Grab the secondary public IP from the Azure VPN Gateway and create a new AWS VPN connection with that. You can, and unfortunately should, keep using static routing.
  3. Create two new S2S connections on the Azure side just as we have done for the initial tunnels, but pointing to the two new AWS public IP address, this will bring a total of 4 connections for the gateway pair.
  4. Now you have a full VPN mesh between two firewall pairs!

Please note you don’t need to make further changes on AWS for routing as the route table is already accepting propagated routes.

Featured image: photo by israel palacio on Unsplash

3 Replies to “AWS to Azure S2S VPN”

  1. Since this is a Azure to AWS VPN connection article, shouldn’t the diagrams depict AWS on the left hand site instead of an on-prem location?

    Like

    1. Hello Nikita,

      You are absolutely right, I just used the ones available on the Azure docs. I am terrible at drawing diagrams! 🙂

      Cheers
      Pedro

      Like

Leave a Reply to Nikita Gulben Cancel reply

Fill in your details below or click an icon to log in:

WordPress.com Logo

You are commenting using your WordPress.com account. Log Out /  Change )

Google photo

You are commenting using your Google account. Log Out /  Change )

Twitter picture

You are commenting using your Twitter account. Log Out /  Change )

Facebook photo

You are commenting using your Facebook account. Log Out /  Change )

Connecting to %s

This site uses Akismet to reduce spam. Learn how your comment data is processed.