Reputation: 2735
Since c++17, the std library has parallel algorithms, so I tried with the following code, summing a list of numbers and want to see if there is any performance gains.
#include <algorithm>
#include <chrono>
#include <execution>
#include <numeric>
#include <iostream>
#include <vector>
int main() {
size_t n = 100000000;
std::vector<size_t> vec(n);
std::iota(vec.begin(), vec.end(), 0);
auto par_sum = [&](size_t k) {
auto t1 = std::chrono::high_resolution_clock::now();
std::vector<size_t> rez(k);
std::iota(rez.begin(), rez.end(), 0);
size_t batch = static_cast<size_t>(n / k) + 1;
std::for_each(std::execution::par_unseq, rez.begin(), rez.end(),
[&](size_t id) {
size_t cum = 0;
for (size_t i = id*batch; i < std::min((id+1)*batch, n); ++i) {
cum += vec[i];
}
rez[id] = cum;
});
auto t2 = std::chrono::high_resolution_clock::now();
auto duration = std::chrono::duration_cast<std::chrono::microseconds>( t2 - t1 ).count();
std::cout << "n_worker = " << k
<< ", time = " << duration
<< ", rez = " << std::accumulate(rez.begin(), rez.end(), 0lu)
<< std::endl;
};
par_sum(1);
par_sum(3);
par_sum(5);
}
Compiled by
g++ -std=c++17 -L/usr/local/lib -O3 -mavx -ltbb a.cpp
Results show that
n_worker = 1, time = 51875, rez = 4999999950000000
n_worker = 3, time = 57616, rez = 4999999950000000
n_worker = 5, time = 63193, rez = 4999999950000000
Questions,
Upvotes: 2
Views: 2260
Reputation: 2573
I would posit that for a small amount of work it may be the case that a tight loop could execute in one CPU by purely keeping within the L1 cache context. As soon as you increase the amount of parallelism on the same data you begin to invoke overhead with cache consistency and page faults.
It would probably be worth you looking at ways of counting cache misses such as those suggested here: Programmatically counting cache faults
and also see: What is a "cache-friendly" code?
plus other resources on "cache friendly code" and "data oriented design": https://www.youtube.com/watch?v=b5v9aElYU2I
and https://youtu.be/fHNmRkzxHWs
This is related to false sharing which is explained here: https://www.youtube.com/watch?v=dznxqe1Uk3E
Upvotes: 3