Benchmarking 5 Popular Load Balancers: Nginx, HAProxy, Envoy, Traefik, and ALB
Available documentation The complete HAProxy documentation is contained in the following documents. Please ensure to consult the relevant documentation to save time and to get the most accurate response to your needs.
Also please refrain from sending questions to the mailing list whose responses are present in these documents. It is used when a configuration change is needed.
It explains the style to adopt for the code. It is not very strict and not all the code base completely respects it, but contributions which diverge too much from it will be rejected. Quick introduction to load balancing and load balancers Load balancing consists in aggregating multiple components in order to achieve a total processing capacity above each component's individual capacity, without any intervention from the end user and in a scalable way. This results in more operations being performed simultaneously by the time it takes a component to perform only one.
A single operation however will still be performed on a single component at a time and will not get faster than without load balancing. It always requires at least as many operations as available components and an efficient load balancing mechanism to make use of all components and to fully benefit from the load balancing. A good example of this is the number of lanes on a highway which allows as many cars to pass during the same time frame without increasing their individual speed.
Examples of load balancing : - Process scheduling in multi-processor systems - Link load balancing e. EtherChannel, Bonding - IP address load balancing e. ECMP, DNS round-robin - Server load balancing via load balancers The mechanism or component which performs the load balancing operation is called a load balancer.
In web environments these components are called a "network load balancer", and more commonly a "load balancer" given that this activity is by far the best known case of load balancing. A load balancer may act : - at the link level : this is called link load balancing, and it consists in choosing what network link to send a packet to; - at the network level : this is called network load balancing, and it consists in choosing what route a series of packets will follow; - at the server level : this is called server load balancing and it consists in deciding what server will process a connection or request.
Two distinct technologies exist and address different needs, though with some overlapping. In each case it is important to keep in mind that load balancing consists in diverting the traffic from its natural flow and that doing so always requires a minimum of care to maintain the required level of consistency between all routing decisions. The first one acts at the packet level and processes packets more or less individually.
There is a 1-to-1 relation between input and output packets, so it is possible to follow the traffic on both sides of the load balancer using a regular network sniffer. This technology can be very cheap and extremely fast.
Usually stateless, it can also be stateful consider the session a packet belongs to and called layer4-LB or L4 , may support DSR direct server return, without passing through the LB again if the packets were not modified, but provides almost no content awareness. This technology is very well suited to network-level load balancing, though it is sometimes used for very basic server load balancing at high speed.
The second one acts on session contents. It requires that the input streams is reassembled and processed as a whole. The contents may be modified, and the output stream is segmented into new packets.
For this reason it is generally performed by proxies and they're often called layer 7 load balancers or L7. This implies that there are two distinct connections on each side, and that there is no relation between input and output packets sizes nor counts. The operations are always stateful, and the return traffic must pass through the load balancer. The extra processing comes with a cost so it's not always possible to achieve line rate, especially with small packets.
On the other hand, it offers wide possibilities and is generally achieved by pure software, even if embedded into hardware appliances. This technology is very well suited for server load balancing. Packet-based load balancers are generally deployed in cut-through mode, so they are installed on the normal path of the traffic and divert it according to the configuration.
The return traffic doesn't necessarily pass through the load balancer. Some modifications may be applied to the network destination address in order to direct the traffic to the proper destination. In this case, it is mandatory that the return traffic passes through the load balancer. If the routes doesn't make this possible, the load balancer may also replace the packets' source address with its own in order to force the return traffic to pass through it.
Proxy-based load balancers are deployed as a server with their own IP addresses and ports, without architecture changes. Sometimes this requires to perform some adaptations to the applications so that clients are properly directed to the load balancer's IP address and not directly to the server's. Some load balancers may have to adjust some servers' responses to make this possible e.
Some proxy-based load balancers may intercept traffic for an address they don't own, and spoof the client's address when connecting to the server. This allows them to be deployed as if they were a regular router or firewall, in a cut-through mode very similar to the packet based load balancers. This is particularly appreciated for products which combine both packet mode and proxy mode. In this case DSR is obviously still not possible and the return traffic still has to be routed back to the load balancer.
A very scalable layered approach would consist in having a front router which receives traffic from multiple load balanced links, and uses ECMP to distribute this traffic to a first layer of multiple stateful packet-based load balancers L4.
These L4 load balancers in turn pass the traffic to an even larger number of proxy-based load balancers L7 , which have to parse the contents to decide what server will ultimately receive the traffic.
The number of components and possible paths for the traffic increases the risk of failure; in very large environments, it is even normal to permanently have a few faulty components being fixed or replaced. Load balancing done without awareness of the whole stack's health significantly degrades availability. For this reason, any sane load balancer will verify that the components it intends to deliver the traffic to are still alive and reachable, and it will stop delivering traffic to faulty ones.
This can be achieved using various methods. The most common one consists in periodically sending probes to ensure the component is still operational. These probes are called "health checks". They must be representative of the type of failure to address. For example a ping- based check will not detect that a web server has crashed and doesn't listen to a port anymore, while a connection to the port will verify this, and a more advanced request may even validate that the server still works and that the database it relies on is still accessible.
Health checks often involve a few retries to cover for occasional measuring errors. The period between checks must be small enough to ensure the faulty component is not used for too long after an error occurs. Other methods consist in sampling the production traffic sent to a destination to observe if it is processed correctly or not, and to evict the components which return inappropriate responses.
However this requires to sacrifice a part of the production traffic and this is not always acceptable. A combination of these two mechanisms provides the best of both worlds, with both of them being used to detect a fault, and only health checks to detect the end of the fault. A last method involves centralized reporting : a central monitoring agent periodically updates all load balancers about all components' state. This gives a global view of the infrastructure to all components, though sometimes with less accuracy or responsiveness.
It's best suited for environments with many load balancers and many servers. Layer 7 load balancers also face another challenge known as stickiness or persistence. The principle is that they generally have to direct multiple subsequent requests or connections from a same origin such as an end user to the same target.
The best known example is the shopping cart on an online store. If each click leads to a new connection, the user must always be sent to the server which holds his shopping cart. Content-awareness makes it easier to spot some elements in the request to identify the server to deliver it to, but that's not always enough.
For example if the source address is used as a key to pick a server, it can be decided that a hash-based algorithm will be used and that a given IP address will always be sent to the same server based on a divide of the address by the number of available servers. But if one server fails, the result changes and all users are suddenly sent to a different server and lose their shopping cart.
The solution against this issue consists in memorizing the chosen target so that each time the same visitor is seen, he's directed to the same server regardless of the number of available servers.
The information may be stored in the load balancer's memory, in which case it may have to be replicated to other load balancers if it's not alone, or it may be stored in the client's memory using various methods provided that the client is able to present this information back with every request cookie insertion, redirection to a sub-domain, etc.
This mechanism provides the extra benefit of not having to rely on unstable or unevenly distributed information such as the source IP address. This is in fact the strongest reason to adopt a layer 7 load balancer instead of a layer 4 one. This expensive task explains why in some high-traffic infrastructures, sometimes there may be a lot of load balancers.
Since a layer 7 load balancer may perform a number of complex operations on the traffic decrypt, parse, modify, match cookies, decide what server to send to, etc , it can definitely cause some trouble and will very commonly be accused of being responsible for a lot of trouble that it only revealed. Often it will be discovered that servers are unstable and periodically go up and down, or for web servers, that they deliver pages with some hard-coded links forcing the clients to connect directly to one specific server without passing via the load balancer, or that they take ages to respond under high load causing timeouts.
That's why logging is an extremely important aspect of layer 7 load balancing. Once a trouble is reported, it is important to figure if the load balancer took a wrong decision and if so why so that it doesn't happen anymore. However, both are commonly used for both purposes, and are pronounced H-A-Proxy. Very early, "haproxy" used to stand for "high availability proxy" and the name was written in two separate words, though by now it means nothing else than "HAProxy".
What HAProxy is and isn't HAProxy is : - a TCP proxy : it can accept a TCP connection from a listening socket, connect to a server and attach these sockets together allowing traffic to flow in both directions; IPv4, IPv6 and even UNIX sockets are supported on either side, so this can provide an easy way to translate addresses between different families.
A lot of settings can be applied per name SNI , and may be updated at runtime without restarting. Such setups are extremely scalable and deployments involving tens to hundreds of thousands of certificates were reported. This protects fragile TCP stacks from protocol attacks, and also allows to optimize the connection parameters with the client without having to modify the servers' TCP stack settings. This protects against a lot of protocol-based attacks. Additionally, protocol deviations for which there is a tolerance in the specification are fixed so that they don't cause problem on the servers e.
This helps fixing interoperability issues in complex environments. Thus it is possible to handle multiple protocols over a same port e. In TCP mode, load balancing decisions are taken for the whole connection. In HTTP mode, decisions are taken per request. It will however not store objects to any persistent storage. Please note that this caching feature is designed to be maintenance free and focuses solely on saving haproxy's precious resources and not on save the server's resources.
Caches designed to optimize servers require much more tuning and flexibility. This results in resource savings and a reduction of maintenance costs. There are excellent open-source software dedicated for this task, such as Squid. However HAProxy can be installed in front of such a proxy to provide load balancing and high availability.
As such it cannot be turned into a static web server dynamic servers are supported through FastCGI however. There are excellent open-source software for this such as Apache or Nginx, and HAProxy can be easily installed in front of them to provide load balancing, high availability and acceleration. These are tasks for lower layers.
8 Top Open Source Reverse Proxy Servers for Linux
Quick reminder about HTTP When HAProxy is running in HTTP mode, both the request and the response are fully analyzed and indexed, thus it becomes possible to build matching criteria on almost anything found in the contents.
It will then become easier to write correct rules and to debug existing configurations. This means that each request will lead to one and only one response. Traditionally, a TCP connection is established from the client to the server, a request is sent by the client through the connection, the server responds, and the connection is closed. Since the connection is closed by the server after the response, the client does not need to know the content length.
Due to the transactional nature of the protocol, it was possible to improve it to avoid closing a connection between two subsequent transactions. In this mode however, it is mandatory that the server indicates the content length for each response so that the client does not wait indefinitely. For this, a special header is used: "Content-length". Its advantages are a reduced latency between transactions, and less processing power required on the server side.
It is generally better than the close mode, but not always because the clients often limit their concurrent connections to a smaller value. Another improvement in the communications is the pipelining mode. It still uses keep-alive, but the client does not wait for the first response to send the second request.
This can obviously have a tremendous benefit on performance because the network latency is eliminated between subsequent requests. For this reason, it is mandatory for the server to reply in the exact same order as the requests were received. This time, each transaction is assigned a single stream identifier, and all streams are multiplexed over an existing connection.
Many requests can be sent in parallel by the client, and responses can arrive in any order since they also carry the stream identifier. By default HAProxy operates in keep-alive mode with regards to persistent connections: for each connection it processes each request and response, and leaves the connection idle on both sides between the end of a response and the start of a new request.
HAProxy supports 4 connection modes : - keep alive : all requests and responses are processed default - tunnel : only the first request and response are processed, everything else is forwarded with no analysis deprecated. The Request line Line 1 is the "request line". The method itself cannot contain any colon ':' and is limited to alphabetic letters.
All those various combinations make it desirable that HAProxy performs the splitting itself rather than leaving it to the user to write a complex or inaccurate regular expression. This is generally what is received by servers, reverse proxies and transparent proxies. It is used to inquiry a next hop's capabilities.
In a relative URI, two sub-parts are identified. The part before the question mark is called the " path ". It is typically the relative path to static objects on the server. The part after the question mark is called the "query string". It is mostly used with GET requests sent to dynamic scripts and is very specific to the language, framework or application in use. The request headers The headers start at the second line.
They are composed of a name at the beginning of the line, immediately followed by a colon ':'. Traditionally, an LWS is added after the colon but that's not required. Then come the values. Multiple identical headers may be folded into one single line, delimiting the values with commas, provided that their order is respected.
This is commonly encountered in the "Cookie:" field. A header may span over multiple lines if the subsequent lines begin with an LWS. In the example in 1. Contrary to a common misconception, header names are not case-sensitive, and their values are not either if they refer to other header names such as the "Connection:" header. The end of the headers is indicated by the first empty line. People often say that it's a double line feed, which is not exact, even if a double line feed is one valid form of empty line.
Fortunately, HAProxy takes care of all these complex combinations when indexing headers, checking values and counting them, so there is no reason to worry about the way they could be written, but it is important not to accuse an application of being buggy if it does unusual, valid things.
This is necessary for proper analysis and helps less capable HTTP parsers to work correctly and not to be fooled by such complex constructs. Both are called HTTP messages. These messages are special in that they don't convey any part of the response, they're just used as sort of a signaling message to ask a client to continue to post its request for instance.
In the case of a status response the requested information will be carried by the next non response message following the informational one. HAProxy handles these messages and is able to correctly forward and skip them, and only process the next non response. As such, these messages are neither logged nor transformed, unless explicitly state otherwise. Status messages indicate that the protocol is changing over the same connection and that haproxy must switch to tunnel mode, just as if a CONNECT had occurred.
Then the Upgrade header would contain additional information about the type of protocol the connection is switching to. The response line Line 1 is the "response line". The "reason" field is just a hint, but is not parsed by clients.
Anything can be found there, but it's a common practice to respect the well-established messages. The response headers Response headers work exactly like request headers, and as such, HAProxy uses the same parsing function for both. Please refer to paragraph 1.
HAProxy – sysadmin’s swiss army knife
Heck, we use it on single-server setups as well! Load balancing HAProxy supports load balancing of TCP layer 4 and HTTP layer 7 traffic with various load balancing algorithms — round-robin, static, by weight, cookie or header to name a few.
Here are a few examples. Slowloris mitigation Slowloris attacks are a type of a DOS attack which allows a single machine to take down a web server with minimal bandwidth and side effects on unrelated services and ports. Slowloris tries to keep open as many connections to the target web server as possible, for as long as possible. It accomplishes this by opening connections to the target web server and sending a partial request, and then periodically sending subsequent HTTP headers, adding to, but never completing the request.
Affected servers will keep these connections open, filling their maximum concurrent connection pool, eventually denying additional connection attempts from clients. Threaded web servers are the most susceptible to such attacks e. Apache, especially when using prefork multi-processing module. However, web servers with an asynchronous, event-driven architecture where the request receives it in the background while other requests are processed are also in peril of such an attack.
HAProxy is also event-driven, but with an intelligent protection mechanism. To mitigate slowloris attacks, HAProxy only needs one directive — timeout http-request timeout — which defines the maximum accepted time to receive a complete HTTP request headers without data. We found that 10 seconds is low enough to keep the bad guys at bay and high enough to avoid terminated connections when the client has slow internet access. This means that concurrency is severely affected by choice of protocol.
At the far extremes of concurrency and latency, TLS has a serious performance effect upon our response times. From a response time perspective, HAProxy and Envoy both perform more consistently under load than any other option.
Requests per second performance Next, we will look at our requests per second. This measures the throughput of each of these systems under load, giving us a good idea of the performance profile for each of these load balancers as they scale: Chart of Requests per Second over HTTP by Load Balancer and Concurrency Level Surprisingly, Envoy has a far higher throughput than all other load balancers at the concurrency range.
While Envoy is also higher at other concurrency levels, the magnitude of the difference is especially high at the concurrency level. This may be due to some intelligent load balancing or caching inside of Envoy as part of the defaults. It warrants further investigation to determine if this result is representative of real-world performance outside our limited benchmark. Traefik stays more consistent under load than Nginx and HAProxy, but this may be mitigated by more optimized configuration of the other load balancers.
This may be a combination of factors: SSL libraries used by the load balancer, ciphers supported by the client and server, and other factors such as key length for some algorithms. This, however, is only one view of the picture. Loggly is a great way to plot trend graphs of performance logs.
During this process, our load balancers were forwarding their request logs to Loggly via syslog. After the load tests, we generated a chart using the Loggly charting feature to see the HAProxy view of the time it took to hit our backend server during the course of the event: HAProxy Backend average response times via Loggly.
Loggly gives you the power to choose from several statistics like average or percentile.
How to Install and Configure HAProxy on Rocky Linux 8
Here, you can see the round trip times from our load balancer to our backend. We are plotting an average of the HAProxy Tr fieldwhich shows the average time in milliseconds spent waiting for the server to send a full HTTP response, not counting data.
This graph shows the load test running at the concurrency level with HAProxy, followed by a break, then the concurrency level. We can see that the backend response time starts off low and increases as we increase the concurrency level.
At its peak, we see the average backend response time at 3. This makes sense because we are loading the backend more heavily so it should take longer to respond.
How to Setup HAProxy as Load Balancer for Nginx on CentOS 8
This can give operators important information about what needs to be scaled in a stack. It is known to have handled over TB a day of traffic at Yahoo. It features a set of keep-alive, filtering, or anonymizing of content requests, and is extensible via an API that allows users to create custom plugins to modify HTTP headers, handle ESI requests, or design new cache algorithms.
It features a reverse proxy httpd-accelerator mode that caches incoming requests for outgoing data. It supports rich traffic optimization options, access control, authorization, logging facilities, and much more.
Pound A Pound is another free and open-source, lightweight reverse-proxy and load balancer and front-end for web servers.
It can also be deployed and configured to act as a reverse proxy.