The critical importance of redundancy in data centers

Published on: 02/07/2024

Redundancy is crucial to modern data centers, essential for ensuring high availability and reliability of services. Redundancy means that multiple installations are provided for the same critical components, such as power supply, climate control, connectivity, monitoring, security, and backups. Power outages and other service interruptions are expensive, so data centers must avoid unplanned downtime at all costs. Redundancy is one of the key factors in achieving that.

The (double) strength of redundancy

Redundancy offers indispensable assurance to data centers and their customers. Firstly, it increases reliability: systems continue to function despite possible failures or interruptions. Additionally, it improves availability: data center services stay active, even during maintenance or component failures. This helps mitigate risks and is crucial for business-critical applications and the uptime of our data centers.

According to Uptime Institute’s 2022 Global Data Center Survey, published in 2023, more operators are investing in their resiliency. About 40% of respondents reported that they were increasing the redundancy levels at their primary data centers. Power and cooling systems have received similar attention, with a third of operators upgrading either or both.

While providing redundant systems equals spending more on the hardware needed, the high cost of data center downtime can often justify these investments to a certain extent. Redundancy minimises the risk of these high costs by ensuring operations and uptime for customers.

Redundancy options

In a modern data center, there are many possibilities for which components can be installed in a redundant way. In this section, we will list the most common options.

Power

An example of power-related redundancy would be independent power circuits. In this case, A and B supplies will be provided, fully separated. This ensures devices remain functional even if one supply fails. Further support for these power circuits could come from redundant UPS (Uninterruptible Power Supply) systems and emergency generators in case of a power outage.

When regular power supply fails, the UPS immediately detects this disruption and switches to battery power to maintain a consistent power supply. Emergency generators do not start operating immediately; they require some time to start up, which can be up to a few minutes. When they are operational, they will take over from the UPS. So even though there are backup systems provided in case of a power outage, these separate systems might also have a redundant equivalent.

Cooling and security

The cooling systems in data centers often have more capacity than necessary, with independent units, preventing a failure in one unit from impacting customer operations. If one of the units fails, there cannot be any risk of overheating, as that can cause the servers to malfunction. That’s why it’s safer to also provide redundant cooling units. The same applies to other security systems (including fire suppression systems, physical and digital security) and system monitoring. It’s a data centers’ core to ensure everything runs safely and optimally for our customers, making redundancy essential.

Connectivity

Additionally, connectivity options are best designed with redundant entry points throughout the building. This is to avoid single points of failure, or intersections where both redundant lines cross, meaning that if one line is disrupted, there does not have to be loss of connection. Some data centers even offer two redundant meet-me rooms, which are fully independent of each other.

An even more specific aspect of redundancy are patches or cross-connects, physical connections between a telecom operator and the customer. From the meet-me room, cross-connects connect to various racks, cages, or suites in the data center where end customers have their equipment. Data centers can opt for redundant structured cabling or connectivity cabling to both meet-me rooms. This backup ensures that if a connection in one part of the data center fails, an alternative connection is available, ensuring service continuity. This is especially important for companies whose end customers have 24/7 access to their services.

Levels of redundancy

Now that we know which components can be redundant, we can look at the different possible levels of redundancy. Redundancy in data centers is categorised into different levels to indicate how robust a system is against failures. Terms such as N+1, N+2, 2N, and 2N+1 are commonly used to describe these levels. But what do these terms mean?

N+1 Redundancy:

N stands for the number of components required to ensure basic functionality. +1 means there is one additional component beyond the necessary number.

For example, if a data center needs 3 cooling units to maintain optimal temperature, an N+1 setup would mean there are 4 cooling units; one extra as a backup in case one of the units fails.

N+2 Redundancy:

This involves two extra components beyond the basic requirement. It offers a higher level of safety as more backups are available. For instance, if two of the required components fail simultaneously, the system will still function.

2N Redundancy:

This is a completely duplicated setup of all required components. So, if a data center needs 5 cooling units, a 2N configuration would have a total of 10 units.

This level of redundancy ensures very high availability because the entire system can continue to operate, even if half of the components fail.

2N+1 Redundancy:

This is similar to 2N redundancy but includes an additional component.

In the example with the cooling units, this means there is not only a duplicate set of 5 units (equalling the 10 mentioned above) but also an extra server as further backup, so 11 in total.

These levels of redundancy ensure that a data center can operate reliably, even under unforeseen circumstances such as technical failures. The higher the level of redundancy, the lower the chance of downtime, but this also comes with higher costs. Choosing the right level of redundancy can also depend on how critical the services provided by the data center are.

Conclusion

Redundancy is not a luxury but a necessity in modern data centers, with benefits for reliability and availability crucial for business-critical operations. Some go even further then redundancy of critical installations in one data center, opting for geo-redundancy. This means that data center components are duplicated in different areas or even countries, to make sure that a climate event in one country or continent won’t halt their operations. Better to be safe than sorry.

But as mentioned above, even redundancy of critical components like power, cooling, and connectivity can be a big aid in preventing service interruptions. Increased investment in redundancy, as reported by the Uptime Institute, reflects its value in mitigating the costly impacts of downtime. Considering the different levels of redundancy, from N+1 to 2N+1, there is a formula that works for every data center and their need for continuous operations.

Related Articles

Responses

Your email address will not be published. Required fields are marked *