In most of the NSX design documents, you will find that they usually consider connecting the NSX ESG(Edge Services Gateway) to physical routers which are usually the border leaf if you are using a Spine-Leaf architecture or Core switches if you are using a 3-Tier architecture. Below are some examples.
Reference: NSX Design Guide
In certain scenarios, the above might not be always the case. Especially, an existing 2/3-Tier Firewalls exist and you cannot change the architecture. There are also instances whereby you have a combine compute-edge or management-edge cluster where the Top of Racks(ToR) are the only switches you have to be able to connect to the NSX Edges. Therefore, the ESXi management VLAN SVI and the External VLANs SVI for NSX Uplinks are all terminating at the same ToR which means they are routable.
Normally, there is Perimeter Firewall, could be Internet facing or Internal Firewall which could help to prevent these inter-routing of these SVIs but terminating the External VLANs SVI on the Perimeter Firewall. Viola! But don’t be too happy yet, because back to my first point, there are not much documentation out there explaining to you how to do that. That explains the rationale for this post!
Personally, from my customer engagement experiences, majority of the time, I have to design connecting the NSX ESG to Firewalls.
Lets list down all the considerations and options.
I would assume Firewalls are deployed in a pair for High Availability and for maintenance purposes. Typically most of the vendors would support Active/Standby and Active/Active. I would say most of the deployments I seen were Active/Standby as the traffic flow is more deterministic and easier to troubleshoot.
NSX Edges could be deployed in Edge-HA mode or ECMP. Basically the major difference is stateful services. Edge-HA would support stateful services like NAT, Firewall and Load Balancer while ECMP mode will do routing only.
Firewalls in Active/Active mode would definitely have better performance than Active/Standby as both physical firewall appliance would be able to process traffic.
NSX Edges ECMP mode can support up to 8 Edges. So if require Performance, ECMP NSX Edges would be the choice.
From my knowledge, there are some firewall vendors would be able to do clustering of up to 8 firewalls and in Active/Active. Active/Active firewall would be more scalable but the difficulty in operating this kind of deployment, I doubt there would be any of these kind of deployments. I still think Active/Standby firewall model is still more manageable.
NSX Edge-HA is just a checkbox you select during deployment. NSX ECMP have to be configured one by one. In terms of manageability, NSX Edge-HA would be easier.
NSX Edge-HA will also allow you to configure any stateful services without any redeployment of NSX Edges.
Active/Active firewall during a failure will have a lesser impact to the traffic as compare to Active/Standby firewall as all the traffic have to redirected from one appliance to another appliance.
NSX Edge-HA failover from the Active Edge to Standby Edge will take about 15 seconds. My colleague Kian Wah would say 22 seconds because he tested it. NSX ECMP during failover would take about 3-4 seconds depending on the routing protocol timers you configured.
Nothing much to be consider here for Security aspects.
1) Active/Standby Firewalls with NSX Edge-HA
2) Active/Standby Firewalls with NSX Edge-ECMP
3) Active/Active Firewalls with NSX Edge-HA
4) Active/Active Firewalls with NSX Edge-ECMP
Decision on what to Test
I would like to test all the above scenarios if I have the time. Lets just pick one option to test and hopefully that would be able to meet 80% of the scenarios. I usually don’t have requirements for Active/Active Firewalls so I will rule out Option 3 and 4.
The major design quality that separate out Option 1 and 2 I would say is Performance. If you require more than 10Gbps, Option 2 would be the way to go. Again, I seldom see customer requirements that have 10Gbps North-South requirements. Lets explore more on Option 1.
Additional note on Option 2: I’m not sure whether does this option even make sense. Have to revisit this option again or probably have to test it out. For now, Focus will be on Option 1.
Most of the Firewall vendors support Static routing, OSPFv2 and BGP. NSX Edge likewise support the same. NSX Edge-ECMP logically would require OSPF or BGP for ECMP.
Decision on what to Test and Goal
I would test all 3 scenarios because I am not sure what would be the behaviour like. The goal of these testings would provide the basis for future design decisions and provide some recommendations for my customers.
I foresee this will probably going take awhile to test and document all the various options, I decided to break up into 3 parts.
1) Active/Standby Firewalls with NSX Edge-HA using static routing protocol (Part 1) [Not Done]
2) Active/Standby Firewalls with NSX Edge-HA using OSPF routing protocol (Part 2)
3) Active/Standby Firewalls with NSX Edge-HA using BGP routing protocol (Part 3) [Not Done]
I would probably have access to Cisco ASA and Checkpoint VE appliances. Most of the firewalls high availability works almost the same way so I guess these two brands would be suffice to represent the rest.
1) Ping test to make sure everything working
2) Failover test – Fail the Firewall Active and measure how long does the failover takes
3) Failover test – Fail the NSX ESG Edge-HA Active and measure how long does the failover takes
To be continued…