Reputation: 638
I've noticed that using this simple example multi-threading almost always takes longer. I'm just testing it out in this code that i made. I'm using it on a 24 core processor. It seems that it works best using 2 threads and 3 or more threads is worst than using 1.
#include <thread>
#include <mutex>
#include <condition_variable>
#include <iostream>
using namespace std;
mutex total;
mutex coutLock;
mutex order;
long long sum=1000000000;
long long mysum=0;
const int threads=3;
long long x;
void dowork(int x,int threads) {
long long temp=0;
for(long long i=x*sum/threads;i<((x+1)*sum/threads);i++) {
temp+=i;
}
total.lock();
mysum+=temp;
total.unlock();
}
int main() {
thread * pool[threads];
for(x=0;x<threads;x++) {
thread *mine=new thread(dowork,x,threads);
pool[x]=mine;
}
for(x=0;x<threads;x++) {
pool[x]->join();
}
cout<<"My sum is: "<<mysum<<endl;
}
Upvotes: 2
Views: 159
Reputation: 161
The loop in dowork()
can be reduced into O(1) code calculating following equation:
temp = (b - a + 1) * a + (b - a) * (b - a + 1) / 2
where a = x * sum / threads, b = (x + 1) * sum / threads - 1
For instance, clang++ 3.5.1 actually generates such code. In that case, unfortunately, the amount of calculation is proportional to the number of threads.
Upvotes: 2
Reputation: 9527
Your code is too simple, that compiler probably do some optimalization in single core run (like auto-vectorization).
Create new thread is also somehow an expensive operation and single thread can finish even before your threads has been created. Common practice in programs is to create some thread pool and then use threads from this pool. They dont need to be allocated again and using them is therefore faster in runtime. But this is not meant for such a simple app like this.
Upvotes: 1