user4388177
user4388177

Reputation: 2613

Performance of multi-threading exceeding cores

If I have a process that starts X amount of threads, will there ever be a performance gain having X higher than the number of CPU cores (assuming all the threads are working synchronously without async calls to storage/network)?

E.G. If I have a two cores CPU, will I just slow down the application starting 3+ constantly working threads?

Upvotes: 2

Views: 172

Answers (4)

someneat
someneat

Reputation: 358

This question: Optimal number of threads per core might help you.

In the thread I wrote an answer describing a scenario when having higher number of threads than the available number of cores boosts performance.

Upvotes: 1

Andriy Berestovskyy
Andriy Berestovskyy

Reputation: 8534

It makes sense to run more threads if your threads make read/write/send/recv syscalls or similar, or sleep on locks, etc.

If your threads are pure computation threads, adding more of them will slow down system because of context switches.

If you still need more threads by design, you might want to look into the cooperative multitasking. Both Windows and Linux have API for that and that will work faster than the context switches. In Windows it called fibers:

https://msdn.microsoft.com/en-us/library/windows/desktop/ms682661(v=vs.85).aspx

In Linux it is a set of functions make/get/swapcontext():

http://man7.org/linux/man-pages/man3/makecontext.3.html

Upvotes: 1

Giulio Franco
Giulio Franco

Reputation: 3230

It is possible that such a thing happens. Both Intel and AMD currently implement forms of SMT in their CPUs. This means that, in general, one single thread of execution may not be able to exploit 100% of the computing resources. This happens because modern CPUs execute instructions in multiple pipelined steps, so that the clock frequency can be increased (less stuff gets done in every cycle, so you can do more cycles). The downside of this approach is that, if you have two consecutive instructions A and B, with the latter depending on the result of the former, you may have to wait some clock cycles without doing anything, just waiting for instruction A to complete. So, they came up with SMT, which allows the CPU to interleave instructions from two different threads/processes on the same pipeline, in order to fill such gaps.

Note: it is not exactly like this, CPUs don't just wait. They try to guess the result of the first operation and execute the second assuming that result. If their guess is wrong, they cancel the pending instructions and start over. Also, they have some feedback circuits that allow tighter execution of interdependent instructions. And nowadays branch predictors are surprisingly good. Things get better for the pipeline if you can just fill gaps with instructions from some other process, rather than going with a guess, but this potentially halves the amount of cache each executing thread can use.

Upvotes: 2

David Haim
David Haim

Reputation: 26476

It really depends on what your code does. it is too broad.

Having more threads than cores might speed up the program for example if some of the threads sleep or try to block on a lock. in this case, the OS scheduler can wake different thread and that thread will work while the other thread is sleeping.

Having more threads than the number of cores may also decrease the program execution time because the OS scheduler has to do more work to switch between the threads execution and that scheduling might be a heavy operation.

As always, benchmarking your application with different amount of threads is the best way to achieve maximum performance. there are also algorithms (like Hill-Climbing) which may help the application fine tune the best number of threads on runtime.

Upvotes: 2

Related Questions