Volume 2: Achieving Any-to-Any Connectivity with Time- Tested Design Approaches and Proven Methodologies described 3-Stage and 5-Stage Clos architectures and how they support any-to-any connectivity. This document describes the arrangement of the elements in the architecture and how it relates to reliability, leveraging the availability calculation method that would typically be used and how various equipment configurations may be compared.
As mentioned previously, one of the top priorities of the Office of Management Budget (OMB) is to maintain high availability of services; which requires a reliable network infrastructure as its foundation. It may seem obvious, but reliability in the data center is of greater importance today for a few reasons.
The importance of reliability and availability become more critical with each passing monthly traffic report. Another reason that the actual hardware being deployed must be assembled into a resilient design, is that the protocols being used to deploy the data center use policy, not metrics, and peering, not link-state protocols to preserve multiport connections. Therefore, the age-old consideration of 1RU/2RU form factors versus chassis-based equipment presents itself.
One of the benefits of the IP Fabric deployed in a Clos based architecture is that it provides efficiency in establishing any-to-any connections. As a networking connection system, versus the switching element within a product design (original evaluation by Clos 1953) provides the higher network system availability; this a direct result from the overlay and underlay working together to keep services running for the subscriber community of the various applications hosted in the Data Center PoD (Point of Delivery). Because of utilizing a single form-factor based infrastructure, deployed in a framework that reflects a crossbar switching platform (like modern chassis-based packet switching and routing systems), the IP Fabric underlay provides the basis for a system availability that, model dependent, exceeds several nines availability (>5).
One standard that assists planners in selecting high availability elements for their design is the manufacturer calculated MTBF (mean-time between failure). To determine the element availability and the system availability the MTBF of the individual devices is used to and calculate the network availability. A risk assumption of 8 hours MTTR (mean time to restoral) was added. To determine system availability, we utilized the element availability rate of the device and used it to calculate system availability using the parallel availability method using the model of a 4 leaf (A), 4 spine system to a 4 Leaf system (B) as the baseline. This is a calculation based upon the element MTBF, the organizations MTTR (mean-time to restoral) targets, to arrive at the availability (A).
While Clos architecture principles were originally applied to the transistor and switching chipset technology of the time, the model works well for a system architecture, if the individual element MTBF/Availability numbers are known.
Expressed below is the availability calculation using the typical leaf switch element of a single form factor 1RU (rack unit) system using the parallel availability calculation.
Thus, the Leaf (Access to the fabric) availability, (network availability) to the server is 99.9999999999985% or 13 nines. The Spine calculation would be performed in the same manner as a Leaf system. As shown below, the availability of the Leaf-Spine-Leaf fabric architecture utilizes a serial calculation where the unavailability is added L1-4u (Leaf 1 through 4 unavailability) to S1-4u (Spine elements 1-4 unavailability) to L5-L8u (3rd stage of the traffic flow in a server to server model Leaf 5-8 unavailability) to arrive at the system availability from server to server for East West traffic. In a Typical Data Center, the traffic flowing East- West accounts for up to 80% of the traffic across the fabric.
The resulting availability between the elements of the 3 stage Clos based underlay of the IP fabric F1 (Fabric 1) or PoD is 99.9999999999984 or 13 nines. This amounts to an expected 5.0767390291639500000000e-07 seconds unavailability per year, or could be also expressed as 5.0767395 nanoseconds/yr. This equates to slightly > one half of one billionth of a single second per year. A three stage Clos configuration yields very high availability as a direct result of the architecture used. In addition, we maintain the benefit of the 3-stage Clos maintaining any- to-any capability, while delivering ultra-high availability (>5 nines, in this case 13 nines). Systems with lower MTBF values may yield lower system availability overall, but what may be the question to ask is: Does this design create an unacceptable level of unavailability?
A chassis-based infrastructure yields similar results, as we view network availability to a set of servers to a fully redundant SLX 9850 chassis-based system. The chassis- based hardware, when configured with the appropriate level of N+1 redundancy componentry. When calculating the individual common equipment within the system, we find that these elemental blocks such as Switch Fabric (S1,S4,S3,S4) Power Supplies (P1, P2, P3, P4,), Fan assemblies (F1-F6) can be calculated as mathematically 100% availability of the system. Due to such high levels of redundancy, the chassis common equipment number for unavailability can be lower that that <1/googol of a second (128 places). This is achieved because of extensive hardware redundancy that consists of subcomponents working in N:1 or N+1 redundant configuration. For example, the switching fabric modules measures 8 nines availability. To arrive at the load sharing availability of all installed switch fabrics (Sa) supporting parallel paths from the line cards, we multiply the unavailability of switch module S1-S6. The difference between single form factor and the chassis-based solution, when implemented in a PoD yields similar results.
The general conclusions can be made that the chassis- based infrastructure buys an additional 1⁄2 of a nanosecond availability when compared to the single form factor implementation. The exercise also demonstrates that the cost of embedding more than 3 access links (LAG) or Leaf to Spine connections achieve connectivity assurance, but do not yield significantly higher availability. In this model the 3 stages at server to Line card (L1-L4) with parallel connections to switch fabrics 1-6 (S1-S6) (stage 2) back to the egress line card to the server port (3rd stage).
To conclude, when costs and ease of expansion are factored into the decision, a single form factor implementation may be less expensive to acquire, deploy and operate and better facilitate a ‘pay-as-you-go’ cost modeling. While traditionally in the past, the importance of the traffic, the amount of growth and proven higher availability of chassis systems dictated their usage, the overall system configuration (3/5 stage Clos based PoDs with 1RU/2RU elements) negates the added benefit vs. cost consideration typically incurred by the selection of the chassis solutions.
There is, however, one inescapable benefit of utilizing the chassis-based system as an element within an IP fabric data center. Which is that it enables higher scale at the access layer while reducing the number of Spines. This should be weighed against the impact of a failure of one of the Spine switches, versus a single form factor-based Spine switch (Risk, Mean Time to Recovery). In a similar configuration with a 1RU based single form factor spine consisting of 8 units with 8 x 100G links from Leaf to Spine, if one unit fails, only 12.5% of the overall bandwidth is lost. When a chassis based system fails as a Leaf, 100% of the access bandwidth is gone, or if deployed as a spine (with 8 x 100G links from the Leaf layer to 2 units of 4 slot SLX 9850’s) 50% of the bandwidth is lost.
Therefore, the general question to ultimately ask the technical director or program manager is:
Does the Data Center planner want to risk 50% of the Data Center PoD bandwidth to a system that has a < 1⁄2 of a nanosecond of unavailability per year risk?
Or does the Data Center planner want to risk 12.5% of the data center PoD bandwidth to a system with a slightly > 1⁄2 nanosecond of calculated unavailability per year.
Where the decision may have been clear in previous decades that the risk involved with choosing a chassis system or not choosing a chassis may have appeared clear, today’s choice involves splitting mere nanoseconds.
In this document, industry standard definitions for element availability were used for comparison purposes. A uniform standard means to define the network level availability by tiers in the data center was identified. The use of single form factor switching elements was compared with chassis- based systems. Various elements provided a means of comparison when deployed in series or parallel and the resulting industry standard availability measurement when deployed in a Point of Delivery (PoD), or data center fabric system.
Upon reviewing the total availability levels of the fabric, the underlay can now be programed onto the hardware. From this point, the element software maintains reachability for the rock-solid foundation built from these elements that are configured to deliver the any to any connectivity discussed in the previous document.