Reputation: 183
I would like to ask how can I parallelize the following loop.
It is currently crashing. I tried to search and most answers show that the problem is that I am using std::vector
. I tried to make a fixed-sized std::vector
. But the application still crashes. Could you tell me what is wrong in the following loop?
std::vector<int> a(pairsListFlags.size());
std::generate(a.begin(), a.end(), [n = 0]() mutable { return n++; });
std::for_each(std::execution::par_unseq, std::begin(a), std::end(a), [&](int i) {
int a = pairsList[i * 2];
int b = pairsList[i * 2 + 1];
if (getCollisionOpenNurbs(OBB[a], OBB[b])) { //Check OBB collision +4-6 ms
if (FaceFace(P[a], P[b], Pl[a], Pl[b])) {//Check polygon intersection +20 ms
pairsListFlags[i] = 1;
}
}
});
Upvotes: 0
Views: 1157
Reputation: 50338
Your problem is not embarrassingly parallel and so you should not use std::for_each
here (at least not without synchronisation mechanisms or atomics which would be inefficient). Instead, you can perform a reduction using std::reduce
. Here is an example:
std::vector<int> a(pairsListFlags.size());
std::generate(a.begin(), a.end(), [n = 0]() mutable { return n++; });
int counter = std::reduce(std::execution::par_unseq, std::begin(a), std::end(a), 0, [&](int i) {
int a = pairsList[i * 2];
int b = pairsList[i * 2 + 1];
if (getCollisionOpenNurbs(OBB[a], OBB[b])) { //Check OBB collision +4-6 ms
if (FaceFace(P[a], P[b], Pl[a], Pl[b])) { //Check polygon intersection +20 ms
pairsListFlags[i] = 1;
return 1;
}
}
return 0;
});
Note that you should be careful about false sharing on pairsListFlags
since it can decrease a bit the performance of the resulting code (but have no impact on the result).
Upvotes: 2