Reputation: 513

How to choose correct number of threads for C++ multithread application?

I am C++ backend developer. I develop server side for realtime game. So, application architecture look like this:

1) I have class Client, which process requests from game client. Examples of requests: login, buy something in store (game internal store), or make some stuff. Also this Client handle user input events from game client (It's very often events, which sends ten times in second from game client to server, when player play in gameplay).

2) I have thread pool. When game client connect to server I create Client instance and bind them to one of threads from pool. So, we have relationships one to many: one thread - many Clients. Round-robin used to chose thread for binding.

3) I use Libev to manage all events inside server. It's mean when Client instance receive some data from game client through network, or handle some request, or trying to send some data through network to game client he lock hi's thread. While he make some stuff other Clients, which share same thread will be locked.

So, thread pool is bottleneck for application. To increase number concurrent players on server, who will play without lags I need increase number of threads in thread pool.

Now application work on server with 24 logical cpus ( cat /proc/cpuinfo say it). And I set thread pool size to 24 (1 processor - 1 thread). It's mean, that with current online 2000 players every thread serves about 84 Client instances. top say that processors used less then 10 percents.

Now question. If I increase number of threads in thread pool is It increase or decrease server performance (Context switching overhead vs locked Clients per thread)?

UPD 1) Server has async IO (libev + epoll), so when I says that Client locked when send and receive data I mean coping to buffers. 2) Server also has background threads for slow tasks: database operations, hard calculation operations, ...

Upvotes: 5

Answers (5)

David Haim

Reputation: 26496

Well few issues.

2) I have thread pool. When game client connect to server I create Client instance and bind them to one of threads from pool. So, we have relationships one to many: one thread - many Clients. Round-robin used to chose thread for binding.

You didn't mention asynchronous IO in any of the points, I believe your true bottleneck here is not the thread count, but the fact that a thread is blocked because of an IO action. by using asynchronous IO (which is not an IO action on another thread) - the thoughput of your server increases by huge magnitutes.

3) I use Libev to manage all events inside server. It's mean when Client instance receive some data from game client through network, or handle some request, or trying to send some data through network to game client he lock hi's thread. While he make some stuff other Clients, which share same thread will be locked.

again, without asynchronous IO this architecture is very much 90's server-side architecture (a-la Apache style). for maximum performance, your threads should only do CPU bound tasks and should not wait for any IO actions.

So, thread pool is bottleneck for application. To increase number concurrent players on server, who will play without lags I need increase number of threads in thread pool.

Dead wrong. read about the 10k concurrency issue.

Now question. If I increase number of threads in thread pool is It increase or decrease server performance (Context switching overhead vs locked Clients per thread)?

So, the anecdote about number of threads as the number of cores is only valid when your threads do only cpu bound tasks and they are never blocked and they ae 100% staurated with cpu tasks. if your threads are also being blocked by locks or IO actions, this fact breaks.

If we take a look at common Server side architectures, we can determine what the best design we need

Apache style architecture:
having a fixed-size thread pool. assigining a thread to each connection in the connection queue. non asynchronous IO.
pros: non.
cons: extremly bad throughput

NGNix/Node.js architecture:
having mono-threaded - multi-processed application. using asynchronous IO.
pros: simple architecture that eliminates multi-threaded problems. goes extremly well with servers that serve static data.
cons: if the processes have to shrare data, huge amount of CPU time is burned on serilizeing-passing-desirilizing data between processes. also, multithreaded application can increases performance if done correctly.

Modern .Net architecure:
having multi threaded-mono processed application. using asynchronous IO.
pros: if done correctly, the performance can blast!
cons: it's somewhat tricky to tune multi-threaded applicaiton and use it without corrupting shrared data.

So to sum it up, I think that in your specific case you should defenitly use asynchronous IO only + having a threadpool with the number of threads equal to the number of cores.

If you're using Linux, Facebook's Proxygen can manage everything we talked about (multithreaded code with asynchronous IO) beautifuly for you. hey, facebook are using it!

Upvotes: 2

Andreas H.

Reputation: 1811

The optimal number of threads depends on how your clients are using the cpu.

If cpu is the only bottleneck and every core running a thread is constantly at top load, then setting the number of threads to the number of cores is a good idea.

If your clients are doing I/O (network; file; even page swapping) or any other operation which blocks your thread then it will be necessary to set a higher number of threads because some of them will be locked even if cpu is available.

In your scenario I would think it is the second case. The threads are locked because 24 client events are active but are only using 10% of cpu (so events processed by a thread are wasting 90% of his cpus resource). If this is the case it would be a good idea to raise the thread count to something like 240 (number of cores * 100 / average load) so another thread could run on the idling cpu.

But be warned: If clients are linked to a single thread (thread a handles clients 1, 2, 3 and thread B handles clients 4, 5, 6) increasing the threadpool will help, but there may still be sporadic lags if two client events should be processed by the same thread.

Upvotes: 1

L911

Reputation: 89

Starting with the idea to have one thread per core can be nice.

In addition, in some cases, calculating the WCET (Worst Case Execution Time) is a way to define which configuration is the faster (Cores don't have always the same frequency). You can measure it easily with timers (from the beginning of the function to the end, and substract the values to obtain the result in ms.)

In my case case, I also had to work on the consumption as it was an embedded system. Some tools allows measuring the CPU consumption and thus, decide which configuration is the most interesting in this specific case.

Upvotes: 1

QuikProBroNa

Reputation: 816

The optimal number of threads is most often equals to either the number of cores in your machine or twice the number of cores. In order to procure the maximum throughput possible, there must the minimal points of contention between the threads. This number, ie the number of contention points floats between the value of number of cores and the twice the number of cores.

I would recommend to do trials and figure out the way in which you can milk optimum performance.

Upvotes: 1

russw_uk

Reputation: 1267

Many factors can affect the overall performance, including how much each thread has to do per client, how much cross thread communication is required, whether there is any resource contention between threads and so on. The best thing to do is to:

Decide on the performance parameters you want to measure and ensure you have them instrumented - you mentioned lag, so you need a mechanism of measuring worst case lag and/or the distribution of lag across all clients from the server side.
Build a stress scenario. This can be as simple as a tool that replays real client behaviour or random behaviour, but the more representative of real load the better.
Benchmark the server under stress and change the number of threads (or even more radically change the design) and see which design or configuration leads to to the minimal lag.

This has the added benefit that you can use the same stress test alongside a profiler to determine whether you can extract any more performance from your implementation.

Upvotes: 1

How to choose correct number of threads for C++ multithread application?

Answers (5)

Related Questions