Have you ever wondered how do some of the web and mobile applications like Gmail, Amazon, Instagram, or Facebook are designed and how different parts of their system actually work? Or, may be you thought of your own app idea and wondered what will it take to bring it to life? Well, you are in luck, because that is what we are talking about today. Even if you haven’t thought about it, remember that “design twitter for me” is a very popular interview question. So having a good understanding of system design is going to be really useful. Let’s get started.

When it comes to system design, Donne Martin’s System Design Premier is my favorite place to hang out. However if you are just starting out, you might feel a bit overwhelmed. I will try to simply this for you, but once you’re done reading here, I highly recommend checking out his page for more detail and also learn about what some of the other top tier companies are doing for their application stacks.

Basic Web or Mobile Application System Design

Let’s start with a simple design for a web or mobile application. At the very basic level you have a front-end application, a backend server application, and a database for storing your application data.

To understand this better let’s follow a typical request from left to right:

Web or Mobile Request Flow

1. Imagine a user clicking a link or tapping a button in the user interface of the front-end mobile or web application. This application runs on the user’s computer or mobile device.

2. Based on the tap, some action may need to happen on the backend to respond to the user’s request. In most cases this is done by sending an HTTP request over the internet to the backend service.

3. The backend application may then need to connect to the database and fetch, create, update, delete data for the requested function.

Example: In a Facebook like application, (1) if the user has liked or commented on a post on the client, (2) the server application receives that request and figures out what to do and. (3) asks the database to update the relevant repose. At the point the server application can return the response back the client, which in turn will display that to the user.

Improving performance

Our next step is to improve performance of your application. This is important because, if a user taps or clicks on a button on the UI we want them to get the response immediately or as soon as possible to keep them engaged. Google’s research shows a direct correlation between your site’s response time and the user sticking around. For this purpose we need to talk about improving performance of our application.

Performance is measured by how fast you can fulfill a single request.

Note that Performance is different from Scalability – scalability measures how many requests you can handle simultaneously. We’ll talk about that in the next section. To improve performance we are going to use a technique called Caching. A cache is pre-calculated result stored in memory so we can serve it right away with little or no overhead. In the figure below I am using three kinds of caches.

Figure 2: Improving Performance with Caching and Multiple Databases

One is a database cache – that stores and returns results from an in-memory cache server and significantly improves database fetch time. Second is a local cache on the application server – that saves us from looking outside for any data, and Third is a cache on the mobile or web client application avoiding the whole trip to the server and database. I recommend reading my Scaling to Millions post to learn more on this.

Improving Scalability

Alright, so we now have a single user doing very well in our app, but what happens when we grow from 1 to 1000 users. The application started to become slow and users start to timeout errors. Why is that?

The problem is that the application and database server cannot handle all the simultaneous requests, both in terms of network traffic and CPU they are maxed out. Time to think about scalability.

Scalability is defined by the number of concurrent requests you can handle.

The first thing that may come to mind is to get a more powerful server and database system. This is called scaling vertically. There are two main problems with vertical scaling — First, it can get very expensive to keep upgrading and Second, if your traffic goes down during certain times there is no easy way to scale back down.

Scaling Horizontally

A better scaling option is to have a cluster of servers where any one them can handle the request. You can change the number of servers up and down as needed. The following diagram shows vertical scaling.

Figure 3: Scalability Improvements

Load Balancers

Note that we have added a load balancer to our system. A load balancer sits between your client application and server application when you have multiple servers running. It picks one of the server to handle the request based on some predetermined formula. This improves the scalability of the system since the load is distributed and a single server is not overwhelmed. Having multiple application servers also improve “high availably” of our application since in case a single server node fails the application continues to work.

Database Cluster

Note how we have also created a “read replica” of the database that we are using for reading the data which makes the response faster since the master database is focused on the write operations. Splitting your tasks like this significantly improves performance and response time of your application. We’ve also split our database and “sharded” (or distributed) the data between them. With Sharding one database server is responsible for only part of the data. This improves scalability but does not help with the availability of the server since if a database node fails you will loose that data. We will talk about that in the next section.

Further reading

Now that you have good understanding of the basics, you may consider reading more on the following topics:

High Availability (HA)

HA means that the system can continue to operate even if a server or database node fails. We touched upon it in the scalability section. You can read here for more information. It is a big topic and perhaps I will talk about it in another post.


Over time your server application can become very large and complex. That makes it hard to modify and update. Microservices is a great way to solve this problem by splitting the server application into smaller apps that focus on a single task or job. Another benefit of microservices is that they be be sometimes reused between applications. You can read more about them here.

Job Workers

Job works is a great way to perform tasks on the background while you main application deals with live requests. You can read more about them in my previous post about using message queues for job workers.

Content Delivery Networks(CDN)

Content Delivery networks improve performance and reliability of your application by server contents and data closer to the user’s geographical location. Read What is a CDN by CloudFlare team to get started.

Advanced system design concepts

Like I mentioned in the beginning, system design for modern web and mobile applications is a complicated subject, you should consider reading Donne Martin’s page when you are ready to dive deeper.

As alway, I hope you found this post useful. Feel free to reach out if you have any questions on comments.