High Availability Hopeful in Helsinki

advice column hero image

High Availability Hopeful in Helsinki

Dear Sloan,

I’m a systems operations lead at a pretty big company. We’re starting to upgrade some services to High Availability (HA) deployments. I’m pretty good at drawing diagrams and deployment charts…but I’m struggling to explain to my teammates why HA is so important! Can you give me a hand?


High Availability Hopeful in Helsinki

Howdy, HA Hopeful,

Yes, I think I can help you explain why High Availability (or HA) is so important. Congrats on making it an organizational priority! As I’ll explain, HA is about cutting down on disruptions, and both your user and your organization will thank you for the improvements.

What is High Availability?

High Availability refers to the ability of a service to stay available even if something fails. That’s compared to more traditional deployments, where any failure takes the service offline until it’s fixed. Usually, HA is achieved through redundancy, failover mechanisms, and a proactive approach to monitoring.

Here’s an example with a database service. A High Availability deployment might have two identical databases that users connect to, with a load balancer in front directing traffic. If one database fails, the load balancer automatically redirects traffic to the active database. Because the databases are always synced, no data is lost during the switch.

Why is it Important?

If a service — customer-facing or otherwise — goes down, then productivity and revenue take a hit. Worse, because digital infrastructure is often a sequence of connected tools, one breakage can lead to many more. 

But with a High Availability deployment, services are “on” more of the time. If an element of one service fails, users and other services won’t be affected. As a bonus, because the failure isn’t actively disruptive, it can be investigated and resolved at a more natural pace.

Businesses sometimes summarize their tolerance for disruption with two metrics – RTO and RPO. RTO stands for “Recovery Time Objective” and represents their tolerance for active disruption when a service fails. RPO stands for “Recovery Point Objective” and represents their tolerance for lost or destroyed data. High Availability supports both RTO and RPO simultaneously because it prevents failures in the first place. 

(But! HA is not a full disaster recovery plan in and of itself. Major disasters still happen, and you need to think critically about your RTO and RPO even if you’re using an HA deployment.)

That’s why HA it’s so crucial for every business to consider HA deployments.

What HA Looks Like for You

High Availability Hopeful, because you’re an engineer, you already know that there’s not one correct HA model. Services have different infrastructure requirements. There are budget, scalability, and performance issues to consider.  Existing disaster recovery plans and RTO/RPO targets complicate things further. And some businesses have to comply with regulations, and that affects what sort of HA deployment they prefer.

But, if you’re struggling to explain why HA is important, fall back on those diagram-drawing skills you mentioned earlier. A super simplified diagram for an HA deployment would be a series of “Server” nodes grouped together, leading into a “Load Balancer” node, which leads into some “User” nodes. See below for an example!

What this diagram demonstrates is that a failure of any one Server node won’t cause disruptions. The load balancer simply directs traffic to a different, active node.

And that’s the essence of why HA is so important for businesses. High Availability deployments cut down on disruptions. And that means that your users, both internal and external, can stay productive.

Thanks for writing in!



0 0 votes
Article Rating
Notify of
1 Comment
Oldest Most Voted
Inline Feedbacks
View all comments

Although HA is common knowledge to most, I appreciate the summary and illustration to assist us when discussing the topic. Very easy to use your explanation if needed!