Server Redundancy: How to Reduce Downtime and Improve Performance

Server redundancy literally means more than one server. Each additional server improves resilience or performance, and a given setup can emphasize either of these areas. The only downside to redundancy is the cost, as each additional server adds to hardware, labor, and operational expenses.

When considering redundant servers, it helps to frame that cost against the cost of server downtime. From there, you’ll have to prioritize between resilience and performance to suit the needs of your organization. Let’s dig in.

Comparing Costs: Server Failure vs Redundant Servers

A server failure’s cost can be measured by the loss of business during the downtime until the server is replaced. If the cost of a server failure exceeds the cost of a redundant server implementation, then it is time to invest in redundancy. 

This simple, universal concept applies to all companies and all situations, but the specific calculation for a specific server within an organization can be more subjective. Risk analysis requires us to consider the impact of downtime during peak usage hours for that server and estimate the consequences. How much business will be lost if our critical server is down for an hour? A day? 

Every business has different sensitivities for different applications. If email goes down for the day for a dentist, the cost may be tens of dollars, but if the digital X-Ray machine goes offline it may cost hundreds of dollars per hour in lost business and labor expenses. For a computerized manufacturing line or a stock brokerage firm, downtime might be measured in thousands of dollars per second or more. 

These estimates of downtime and potential business loss costs provide both the budget and the permissible downtime for the server. The budget for the server should also account for all of the associated costs with owning the replacement server in the configuration it will be used. This should include costs for: 

  • Acquisition costs of the hardware and software
  • IT labor costs for setup of the server and the redundancy configuration
  • Ongoing operation costs for the redundant server: power, cooling, IT maintenance, etc.

Redundant Servers Improve Resilience & Decrease Downtime

Server redundancy directly addresses resilience by providing additional resources that can be deployed in the event of failure. However, simply having the resource is not the same as having it ready. if we purchase a server and keep it in the shipping box in the server room, we have achieved server redundancy, but we may not be particularly resilient because it will take time to deploy that second server. 

Such a manual failover compensates for poor resilience with minimal operating costs, but some of those savings will be offset by business losses. A manual failover process increases delays and should only be used for functions that can afford to wait. In the examples above, the dentist’s office may have the luxury to wait to replace their email server, but the stock brokerage does not.

Automatic failover uses software to automatically switch from a primary server to the failover, or backup servers. This option costs more because any additional servers will require the same power, maintenance, and space as the primary server. Even automatic failover may result in minor business interruptions. For example, for a credit card transaction server, if the backup server is simply kept powered on, a transaction in process may be lost when the server fails, forcing the shopper to restart the transaction.

Application service providers can deploy even more resilient failover servers that synchronize alarm states, app engine status, and other information that would allow the second server to deploy immediately with minimal interruption for the users. Of course, configuring the synchronization adds to the complexity of the setup, so this could also be a more expensive option.

Redundant Servers Can Enhance Performance

If we build a backup bridge for a highway, we wouldn’t just let it sit there, unused, while traffic builds up on the original bridge. Similarly, IT managers dislike having idle redundant servers simply burn costs in the event of a rare failure. 

Additionally, as a business grows, the original server sizes may become too small for peak time usage so IT managers are faced with a choice. Do they start using the original server and it’s backup simultaneously or do they now have to buy two new, larger, more expensive servers? 

Most begin to split up work. This work split can be done using load balancing that can split work simply by function. A more sophisticated, but expensive method, uses software that monitors server states and allocates new tasks to the least-loaded server. 

Sometimes, multiple servers will be combined to form a server cluster in which two or more servers share the same IP address and function as a single server on the network. As the company grows, clusters can provide both enhanced performance as well as redundancy for resilience by adding additional servers to the cluster.

As one can expect, performance-oriented deployments may be more expensive than resilience-oriented deployments, due to the additional resources required to setup, maintain, and secure the additional active servers. However, if the business will benefit from additional performance, the costs may pay sufficient dividends to justify the trouble.

Cloud Costs and Deployment Times

Server redundancy provides the same benefits, whether accomplished in a data center or through the cloud. However, when using cloud resources, these benefits come with tradeoffs. Most importantly, the cloud and data centers each require their own specialty IT expertise to implement them correctly. 

From a comfort level, many data center managers lament the loss of direct control over the hardware. But for others, offloading hardware monitoring is the benefit. From a financial standpoint, we should discuss implications with our CFOs because we will be moving from depreciable assets (servers) and transitioning to ongoing operational subscription costs. Deployments that combine cloud and local resources will further increase the complexity for security and failover coordination than a pure cloud or pure data center configuration.

Yet the cloud provides additional benefits that a local data center cannot deliver. Cloud options can further reduce downtime and possibly reduce costs when compared to a data center deployment. The cloud provider will also maintain additional hardware redundancy to maintain the cloud’s terms of service that do not need to be actively managed by a subscriber. 

Two huge advantages are 1) that cloud deployments deploy and decommission more quickly than physical servers and 2) there are no sunk costs for decommissioned hardware. In a traditional data center, if we have a surge in demand, we need to buy a server and install it, only for that cost to eat up our budget when demand drops. Even if we turn off the server, we don’t get our money back. However, the trivial difficulty to turn off cloud resources save money with no sunk costs. This speed provides flexibility and cost controls for variable demand needs such as seasonal web ecommerce or sporting event broadcasts.

Conclusion

As any organization grows, their needs will also evolve. While the first requirement to protect a business will be resilience, performance becomes a more significant concern as a server load grows larger. Server redundancy allows an IT manager to continue to address these issues, although any deployment will be budget and resource constrained. Cloud resources can provide financial flexibility, but managers often overlook the added security, deployment and integration difficulties. To be successful, managers need to balance their redundant server deployment to match the priorities of the organization with the resources available to manage them. 

Leave a Comment