Is Nginx's approach to CPUs scalability (per process epoll event queue) optimal?

Nginx's approach to CPUs scalability is based on the creation of the number of almost independent processes each owning an event queue and then using SO_REUSEPORT to spread incoming connections, IRQs, NIC packets over all cores relatively evenly.

Does it lead to better scalability (fewer kernel data sharing = fewer locks) than creating only one Linux process followed by the creation of array of threads still equal to the number of CPUs and a per thread event queue in every thread?

Here is an example of Nginx scaling up to around 32 CPUs. Disabled HT and overall number of 36 real cores could be the main reason for this, as well as relative NICs saturation or even cores GHz drop due to overheating:

https://www.nginx.com/blog/testing-the-performance-of-nginx-and-nginx-plus-web-servers/

Also: https://dzone.com/articles/inside-nginx-how-we-designed

Upvotes: 8

Answers (3)

Anon

Reputation: 7194

Theoretically, purely asynchronous calls in a situation where you don't use (red) threads and don't need to share data would be better than using (red) threads because you will avoid the context switching overhead that forces you to bounce in and out of the kernel just to switch to another thread. You may also be less likely to get contention (threads can accidentally share something internal such as a cache-line).

In reality it could go either way depending on the program in question the vagaries of the programming language, the kernel, whether the threads were red or green, the hardware, the task, skill of the programmer etc.

Coming back to your original question NGINX's approach is going to be good and the overheads are going to be low (contrast it with Apache for example). For pure "packet pushing" it's an excellent low overhead approach but you may find a tradeoff when it comes to flexibility. It's also worth noting NGINX can spin up a worker per core so at that point it can reap the benefits of affinity (less data moving because everything is hopefully local) while still being lower overhead...

Depending on where the data is coming from and going you can likely best NGINX in specific scenarios (e.g. by using something like DPDK) or using technology built around techniques like io-uring but perhaps at some point in the future NGINX itself will adopt such technology...

Upvotes: 1

Maxim Egorushkin

Reputation: 136525

Does it lead to better scalability (fewer kernel data sharing = fewer locks) than creating only one Linux process followed by the creation of array of threads still equal to the number of CPUs and a per thread event queue in every thread?

From The SO_REUSEPORT socket option article:

The first of the traditional approaches is to have a single listener thread that accepts all incoming connections and then passes these off to other threads for processing. The problem with this approach is that the listening thread can become a bottleneck in extreme cases. In early discussions on SO_REUSEPORT, Tom noted that he was dealing with applications that accepted 40,000 connections per second.

Upvotes: 0

Liam Kelly

Reputation: 3714

So it looks like we can get hard data to answer this question by comparing Nginx to Envoy Proxy because it uses the architecture you are curious about:

Envoy uses a single process with multiple threads architecture. A single master thread controls various sporadic coordination tasks while some number of worker threads perform listening, filtering, and forwarding

While they were initially developed years apart to solve different problems, they currently have extremely similar capabilities and are often compared against one another.

Looking at one such comparison, Envoy showed better throughput and latency. Another comparison has Ambassador (based on Envoy) vs Nginx, and again Envoy shows better results.

Given this data, I'd say that yes the Single Process, Event-Loop and Thread Pool model (Envoy) seems to scale better than multiple processes with shared-IPC model (Nginx).

Upvotes: 0

Is Nginx&#39;s approach to CPUs scalability (per process epoll event queue) optimal?

Answers (3)

Related Questions

Is Nginx's approach to CPUs scalability (per process epoll event queue) optimal?