Reputation: 233
Here is the main part of my code, to speed it up I am using multithread which is showing below, so my basic idea is to simply chunk them into 12 parts and let's them do their job separately
int Thread_num=12;
int firstone=0;
int lastone=vector.size();
int chunk = (lastone-firstone+(Thread_num-1))/Thread_num;
std::thread t[Thread_num];
for(int i=0;i<Thread_num;i++)
{
int s =firstone+i*chunk;
int e = ((s+chunk)<vector.size())? (s+chunk) : vector.size();
t[i]=std::thread(calculateAll,data,arr,s,e);
}
for (int i = 0; i < Thread_num; ++i)
{
t[i].join();
}
and here is the calculateAll function (not exact code), and I use lock to lock push_back part to avoid they write into that list
vector at the same time( the order does not matter).
void calculateAll(int ***data,LineIndex* arr,int s,int e)
{
for(int a=s;a<e;a++)
{
function_1(arr) /*do something with array(arr)*/
result=function_2(data) /*do something with data*/
mylock.lock();
list.push_back(result);
mylock.unlock();
}
}
So theoretically,will that be speeded up 12 times? when I use that idea in my code, it speed up like 5 to 6 times, does this make sense? and can I modify something to make the performance better, maybe use some other method? Appreciated
Upvotes: 0
Views: 143
Reputation: 76295
That lock()
and unlock()
are killing performance, turning your parallel algorithm into a more-or-less serial one. As one of the comments suggests, give each thread its own list to store its results in, and when all the threads have finished, consolidate the results.
On a different tack, when you say your server has 14 cores, is that actual, physical cores, or is it 7 cores, each with two hyper-threads? If it's the latter, the hyper-threads interfere with each other, and you don't get the full speedup that you'd get from separate cores.
Upvotes: 1