Load Balancing Explained: Distributing Traffic Smartly

As soon as you use more than one server, you need something to decide which server handles which request. That's what a load balancer does. Simple concept, but the details make all the difference.

A load balancer sits between the visitor and your application servers. It receives every incoming request and forwards it to one of the available servers in your cluster. That sounds straightforward, but there's quite a bit of intelligence behind that forwarding logic.

Why load balancing?

The most obvious reason is capacity: a single server can only handle so much. By spreading traffic across multiple servers, you increase total processing power. But there's more:

Availability: if one server goes down, the load balancer takes it out of rotation. Visitors notice nothing.
Maintenance without downtime: you can update servers one by one while the rest handle the traffic.
Security: your application servers aren't directly reachable from the outside. Only the load balancer is exposed to the internet.
Health checks: the load balancer continuously monitors whether servers are available and doesn't send traffic to servers that aren't responding.

The main algorithms

How a load balancer decides which server receives a request depends on the chosen algorithm. The common options:

Round Robin is the default. Each request goes to the next server in the list, taking turns. Simple and fair when servers are equal. The downside is that it doesn't account for how heavily servers are already loaded.

Least Connections sends each request to the server with the fewest active connections at that moment. This works better when requests vary significantly in duration, so long queries, uploads or API calls don't all stack up on one server.

IP Hash (or Sticky Sessions) always sends the same visitor to the same server, based on IP address or a cookie. This is useful if your application stores session data locally. The downside: if that server fails, the session is gone. It's better to solve session storage with Redis.

Weighted Round Robin distributes traffic based on weights. A server with 16 cores then receives twice as many requests as a server with 8 cores. Useful for heterogeneous clusters.

Least Response Time combines server load and response time to choose the optimal server at that moment. More advanced, but more effective for large clusters with variable load.

Load balancing algorithms compared

Algorithm

Complexity

Sessions

Best for

Round Robin

Low

Not ideal

Equal servers, stateless apps

Least Connections

Medium

Neutral

Variable request duration, APIs

IP Hash

Low

Good

Legacy apps with local sessions

Weighted

Medium

Not ideal

Mixed server capacities

Least Response Time

High

Neutral

Large clusters, high performance needs

Layer 4 vs Layer 7: at which level does the load balancer work?

Load balancers operate at two levels of the OSI model, and this distinction is practically relevant.

Layer 4 (transport): the load balancer only looks at IP address and TCP/UDP port. It has no idea what's in the payload. This is extremely fast and uses few resources, but offers little flexibility. You can't make decisions based on URL path, HTTP headers or cookies.

Layer 7 (application): the load balancer understands HTTP. It can route traffic based on URL patterns, host headers, cookies and query parameters. This allows you to send API traffic to different servers than regular page requests, or run A/B tests at the infrastructure level. It costs slightly more overhead, but offers many more possibilities.

For most web applications, Layer 7 is the right choice. Layer 4 is interesting for protocols other than HTTP, or when you need absolute maximum throughput.

HAProxy and NGINX: the two standards

In practice, two tools are used most for load balancing:

HAProxy is built for one purpose: load balancing. It's extremely stable, has extensive built-in statistics and supports both Layer 4 and Layer 7. The configuration syntax is specific and takes some getting used to, but the control you get is substantial. HAProxy is the choice when load balancing is the only thing the server needs to do.

NGINX is primarily a web server that can also do load balancing. For many setups, that's sufficient: you combine reverse proxy functionality with load balancing in one configuration file. The syntax is more accessible. NGINX is the right choice when your load balancer also needs to handle SSL termination, static files or other web server functions.

For large-scale, dedicated load balancing, HAProxy performs better. For integrated setups, NGINX is more practical.

SSL termination at the load balancer

A common approach is to handle SSL connections at the load balancer (SSL termination). The load balancer decrypts the HTTPS traffic and forwards it unencrypted to the application servers on the internal network. Advantages:

Certificate management in one place, not on each server separately
Application servers don't have to process SSL overhead
Centralisation of TLS configuration and cipher policy

On the internal network (between load balancer and application servers), traffic is then unencrypted. In a well-isolated private network environment, that's acceptable. If your compliance policy requires end-to-end encryption, you use SSL passthrough or re-encryption.

Typical load balancer topology

Internet / CDN

HTTPS :443

Load Balancer (HA-pair)

HTTP :80 (internal)

Web 1

Web 2

Web 3

+ auto-scale

DB connection

🗄️ DB Primary

🗄️ DB Replica

HA-pair = two load balancers with failover. Failure of one is automatically taken over.

Health checks: continuous monitoring

A load balancer continuously performs health checks on the servers behind it. This can be simple (TCP connection succeeds) or more advanced (a specific HTTP endpoint returns 200). As soon as a server stops responding, it's removed from rotation until it becomes available again.

Configuring good health checks is an underestimated step. A check that's too superficial lets a failing server stay in rotation too long. A check that's too heavy causes unnecessary load. A good practice: have the application expose a dedicated /health endpoint that also verifies the database connection and critical dependencies.

When do you need load balancing?

Not every website needs a load balancer. A single server with good caching can handle quite a lot of traffic. Load balancing becomes relevant when:

Your application no longer fits on one server (in terms of capacity or risk)
You need zero-downtime deployments
Peak moments are unpredictable and you want to use auto-scaling
You have high availability requirements (99.9% or higher)

For webshops with more than a few hundred concurrent visitors, a load balancer is no longer a luxury. It's the foundation of a reliable infrastructure.

Conclusion

Load balancing isn't a complicated concept, but the implementation details matter. Algorithm, layer, health checks, session handling: each choice has consequences for performance and availability. A well-configured load balancer is the backbone of any serious web environment.

Need help with load balancing? Explore our high-traffic solutions or e-commerce hosting.