The AWS Transit Gateway solution can be considered a one stop solution for all your edge routing requirements.You can scale your AWS environments to 100s VPC or use many many accounts and connect them to your offices across the world various combinations of TGW peering,Direct Connect Transit VIF and VPN connections.When connecting 100s or 1000s of VPCs to on-premises network you might not have the perfect IP subnets seggregation between the on-premises and the AWS environment ,which means that you couldn’t just point a 10.0.0.0/8 network and establish connectivity .The Transit VIF supports option for prefix list which can help you summarize your VPC subnets to up to 20 prefixes which you can advertise to on-premises network .However for VPN connections the maximum number of routers which can be advertised by the Transit Gateway towards on-prem network is 1000.This may be enough for most of the workloads ,however there are times when customers have 100s of VPCs and each VPC has multiple CIDR ranges or customers provide separate VPCs for each of their end users which can increase the number of CIDR ranges from more than 1000.
Another situation which may arise is when the customer really wants to just summarize the VPN route sent over BGP to their on-prem network as much as possible .Normally when you have such a big architecture a dedicate AWS Direct connect connection is what would be recommended architecture however each customer has different workloads including requirements of VPN over direct connect.
This hack for Transit Gateways explains a way by which you can summarize routes for BGP based VPN connection from AWS towards on-prem network .
In this setup on the left side we have on-premises or corporate data centre network with network ranges within 10.0.0.0/8 ranges whereas on the right side we have customer VPCs with CIDR ranges such as 10.2.0.0/24 ,10.2.1.0/24 i.e blocks of /24 or more and not as much contiguously assigned as you would like.
Generally the routes advertised over TGW based VPN connections from AWS towards CGW will be the routes from the route table which is associated with the Transit gateway VPN attachment.With a single Transit gateway ,this will be routes to all the VPCs which are attached to the Transit gateway and can be non-contiguous blocks of VPCs which are not as much seggretaed as possible and can go beyond 1000s .
The hack presented here is using two Transit Gateway and a Dummy VPC.The functionality of these components will be as follows
TGW-B : All the VPCs are attached to the transit gateway with the VPC attachment route tables having route to the on-premises supernet 10.0.0.0/8.The Dummy VPC attachment’s route table will have a route to all the individual VPC attachments.
Dummy VPC : You can call it Dummy or Transit VPC (but I would like to avoid the name Transit VPC due to an existing custom solution) .This VPC does not have any resources but just Transit gateway attachments and a VPC route table with the summarized VPC routes pointing to the TGW-B attachment and the on-premises supernet route pointing to the TGW-A attachment.
TGW-A: Your VPN connection will be configured using this Transit Gateway and the VPN attachment route table will have the VPC summary routes pointing to the Dummy VPC attachment.
BGP routing :
Over the BGP connection the AWS VPN endpoint will advertise all the VPC summarized routes such as 10.2.0.0/16 ,10.4.0.0/16 or 10.5.0.0/16 and so on which can be a large number but generally much smaller than the total number of VPC subnets.The CGW device will advertise the on-premises 10.0.0.0/8 supernet and other prefixes which are specific on the existing environment.
Date plane routing and route lookup :
Lets say you want to connect to resources in the VPC-1 from the on-premises servers/users .The traffic from those users will go to the CGW device which will have a route to 10.2.0.0/16 network .This traffic will then be encrypted over VPN connection to reach AWS VPN endpoint and then looked up in the TGW-A VPN attachment route table.
Here the route table will have a route pointing to the Dummy VPC attachment .
Once the traffic leaves the TGW-A and the destination route is looked up in the VPC attachment ,which will have route pointing to the TGW-B attachment for 10.2.0.0/16 network.
On reaching the TGW-B attachment route table (with dummy VPC) ,there will be specific routes for 10.2.0.0/24 with target as the VPC-A attachment.
Since traffic stays within the realm of the Transit gateways,there is not much increase in the latency and the cost would just be in line with the cost of additional VPN connections which you might end up configuring advertising all the individual VPC CIDRs and also this will present very less overhead in terms of management and manual routing.
Comments