Design for Failure – RDS Databases (AWS – Cloud)

Design for Failure - RDS Databases (AWS - Cloud)

In today’s world we no longer operate in an 8am – 5pm business model with technology. It is a 24/7 business that requires high availability, fault tolerant and systems that are designed for failure. But what exactly does this mean?

Let us look at it from a different perspective. You drive down the road in your car which has 4 wheels. There are some that are conscious about saving weight, want better mpg and extra storage capacity in your trunk. As a result, the motto is “I’ll deal then it if it happens”.  When that tire does blow out, you end up having to call a tow truck, missing your appointments and a very expensive tow bill.

However, with very little planning all vehicles come with a spare tire that is pumped up and ready to be put into service at any moment. At any given moment you if you have a flat tire, you can stop the vehicle, jack up the car, remove the bad tire and put on the spare tire and continue moving at a slower pace until you can either fix or replace your tire to put your car back into service.

That spare tire is like a “high availability system” that can be switched or replaced in case the tire blows out, it also allows for fault tolerance because there may be a disruption in your car ride, however, you can continue to your destination without having to call a tow truck and still make your appointments.

With Cloud based systems there is a number of ways to combat this. Everything from the databases to the front end EC2 Instances (Virtual Computer Systems) can be designed to mitigate these very issues.


Amazon Web Services offers a few built in features right out of the box to help design your basic architecture for databases. With RDS (Relational Database Services) you are able to choose the option to choose a high availability database. This means that if the underlying RDS instance ever has problems or becomes unavailable RDS can automatically switch over to a hot-swapable instance in a different AZ (Availability Zone) within the same region.

As the RDS Master database is modified in near real time replays those tractions to the secondary RDS instance to ensure that both Databases are always in sync.

However, when you have systems that require very heavy database reads you can delegate a RDS Master the performs all the database writes and assign read replica’s. Read replicas allow the master database to delegate read query requests to these specific nodes thus allowing for the master database to focus on writes and updates.  Below is an example of a Highly available, fault tolerant system that allows for high amounts of query requests for heavy database loads for any system as shown below.

If you are looking to design your highly available, fault tolerant systems that are designed for failure but not sure how to do this with in a Cloud based environment, feel free to contact me with any of your questions. Remember, it’s not IF it ever happens, it’s WHEN it happens is what you need to prepare for. And if you are prepared for failure, you will succeed in recovering quickly.

Written by: Travis Haag
Note: all images were courtesy of Amazon Web Services (AWS).

Amazon Web Services - Architecture

No Comments Yet.

Leave a comment