GRVigo
GRVigo

Reputation: 43

What's the "real world" performance improvement for multithreading I can expect?

I'm programming a recursive tree search with multiple branches and works fine. To speed up I'm implementing a simple multithreading: I distribute the search into main branches and scatter them among the threads. Each thread doesn't have to interact with the others, and when a solve is found I add it to a common std::vector using a mutex this way:

if (CubeTest.IsSolved())
{ // Solve algorithm found
    std::lock_guard<std::mutex> guard(SearchMutex); // Thread safe code
    Solves.push_back(Alg);  // Add the solve
}

I don't allocate variables in dynamic store (heap) with new and delete, since the memory needs are small. The maximum number of threads I use is the quantity I get from: std::thread::hardware_concurrency()

I did some tests, always the same search but changing the amount or threads used, and I found things that I don't expected. I know that if you double the amount of threads (if the processor has enougth capacity) you can't expect to double the performance, because of context switching and things like that.

For example, I have an old Intel Xeon X5650 with 6 cores / 12 threads. If I execute my code, until the sixth thread things are as expected, but if I use an additional thread the performace is worst. Using more threads increase the performace very little, to the point that the use of all avaliable threads (12) barely compensates for the use of only 6:

Threads vs processing time chart for Xeon X5650:

Threads vs processing time chart for Xeon X5650

(I repeat the test several times and I show the average times of all the tests).

I repeat the tests in other computer with an Intel i7-4600U (2 cores / 4 threads) and I found this:

Threads vs processing time chart for i7-4600U:

Threads vs processing time chart for i7-4600U

I understand that with less cores the performance gain using more threads is worst.

I think also that when you start to use the second thread in the same core the performance is penalized in some way. Am I right? How can I improve the performance in this situation?

So my question is if this performance gains for multithreading is what I can expect in the real world, or on the other hand, this numbers are telling me that I'm doing things wrong and I should learn more about mutithreading programming.

Upvotes: 2

Views: 863

Answers (2)

eerorika
eerorika

Reputation: 238281

What's the “real world” performance improvement for multithreading I can expect?

It depends on many factors. In general, the most optimistic improvement that one can hope for is reduction of runtime by factor of number of cores1. In most cases this is unachievable because of the need for threads to synchronise with one another.

In worst case, not only is there no improvement due to lack of parallelism, but also the overhead of synchronisation as well as cache contention can make the runtime much worse than the single threaded program.

Peak memory use often increases linearly by number of threads because each thread needs to operate on data of their own.

Total CPU time usage, and therefore energy use also increases due to extra time spent on synchronisation. This is relevant to systems that operate on battery power as well as those that have poor heat management (both apply to phones and laptops).

Binary size would be marginally larger due to extra code that deals with threads.


1 Whether you get all of the performance out of "logical" cores i.e. "hyper threading" or "clustered multi threading" also depends on many factors. Often, one executes the same function in all threads, in which case they tend to use the same parts of the CPU, in which case sharing the core with multiple threads doesn't necessarily yield benefit.

Upvotes: 2

Philipp
Philipp

Reputation: 69663

A CPU which uses hyperthreading claims to be able to execute two threads simultaneously on one core. But actually it doesn't. It just pretends to be able to do that. Internally it performs preemptive multitasking: Execute a bit of thread A, then switch to thread B, execute a bit of B, back to A and so on.

So what's the point of hyperthreading at all?

The thread switches inside the CPU are faster than thread switches managed by the thread scheduler of the operating system. So the performance gains are mostly through avoiding overhead of thread switches. But it does not allow the CPU core to perform more operations than it did before.

Conclusion: The performance gain you can expect from concurrency depend on the number of physical cores of the CPU, not logical cores.

Also keep in mind that thread synchronization methods like mutexes can become pretty expensive. So the less locking you can get away with the better. When you have multiple threads filling the same result set, then it can sometimes be better to let each thread build their own result set and then merge those sets later when all threads are finished.

Upvotes: 1

Related Questions