Load balancing distributes traffic among multiple servers to improve a service or application’s performance and reliability.

Load balancing is distributing computational workloads between two or more computers. Load balancing is often employed on the Internet to divide network traffic among several servers. This reduces the strain on each server and makes the servers more efficient, speeding up performance and reducing latency. Load balancing is essential for most Internet applications to function correctly.

Imagine a line at a grocery store with eight checkout lines, only one of which is open. All customers must get into the same line, and therefore it takes a long time for a customer to finish paying for their groceries. Now imagine that the store instead opens all eight checkout lines. In this case, the wait time for customers is about eight times shorter (depending on factors like how much food each customer is buying).

Load balancing essentially accomplishes the same thing. By dividing user requests among multiple servers, the user wait time is vastly reduced. This results in a better user experience — the grocery store customers in the example above would probably look for a more efficient grocery store if they always experienced long wait times.

How does load balancing work?

Load balancing is handled by a tool or application called a load balancer. A load balancer can be either hardware-based or software-based. Hardware load balancers require the installation of a dedicated load-balancing device; software-based load balancers can run on a server, on a virtual machine, or in the cloud. Content delivery networks (CDN) often include load-balancing features.

When a request arrives from a user, the load balancer assigns the request to a given server, and this process repeats for each request. Load balancers determine which server should handle each request based on several algorithms. These algorithms fall into two main categories: static and dynamic.

Static load balancing algorithms

Static load balancing algorithms distribute workloads without considering the system’s current state. A static load balancer will not know which servers are performing slowly or are not used enough. Instead, it assigns workloads based on a predetermined plan. Static load balancing is quick to set up but can result in inefficiencies.

Referring back to the analogy above, imagine if the grocery store with eight open checkout lines has an employee who directs customers into the lines. Imagine this employee goes in order, assigning the first customer to line 1, the second to line 2, and so on, without looking back to see how quickly the lines are moving. If the eight cashiers all perform efficiently, this system will work fine — but if one or more is lagging, some lines may become far longer than others, resulting in bad customer experiences. Static load balancing presents the same risk: sometimes, individual servers can still become overburdened.

Round robin DNS and client-side random load balancing are two common forms of static load balancing.

Dynamic load balancing algorithms

Dynamic load balancing algorithms consider each server’s availability, workload, and health. They can shift traffic from overburdened or poorly performing servers to underutilized servers, keeping the distribution even and efficient. However, dynamic load balancing is more difficult to configure. Several factors play into server availability: each server’s health and overall capacity, the size of the tasks being distributed, and so on.

Suppose the grocery store employee who sorts the customers into checkout lines uses a more dynamic approach: the employee watches the lines carefully, sees which are moving the fastest, observes how many groceries each customer purchases, and assigns the customers accordingly. This may ensure a more efficient experience for all customers, but it also puts a tremendous strain on the line-sorting employee.

Several dynamic load balancing algorithms include slightest connection, weighted most minor connection, resource-based, and geolocation-based load balancing.

Where is load balancing used?

As discussed above, load balancing is often used with web applications. Software-based and cloud-based load balancers help distribute Internet traffic evenly between servers that host the application. Some cloud load balancing products can balance Internet traffic loads across servers spread out worldwide, a process known as global server load balancing (GSLB).

Load balancing is also commonly used within large localized networks, like those within a data center or a large office complex. Traditionally, this has required hardware appliances such as an application delivery controller (ADC) or a dedicated load-balancing device. Software-based load balancers are also used for this purpose.

What is server monitoring?

Dynamic load balancers must know server health: their current status, how well they are performing, etc. Dynamic load balancers monitor servers by performing regular server health checks. If a server or group performs slowly, the load balancer distributes less traffic. If a server or group of servers fails, the load balancer reroutes traffic to another group of servers, a process known as “failover.”

What is failover?

Failover occurs when a given server is not functioning and a load balancer distributes its usual processes to a secondary server or group of servers. Server failover is crucial for reliability: a server crash could bring down a website or application if there is no backup. Failover must take place quickly to avoid a gap in service.

Learn more about different aspects of load balancing: