Reputation: 3534
Using this boost asio based thread pool, in this case the class is named ThreadPool
, I want to parallelize the population of a vector of type std::vector<boost::shared_ptr<T>>
, where T
is a struct
containing a vector of type std::vector<int>
whose content and size are dynamically determined after struct initialization.
Unfortunately, I am a newb at both c++ and multi threading, so my attempts at solving this problem have failed spectacularly. Here's an overly simplified sample program that times the non-threaded and threaded versions of the tasks. The threaded version's performance is horrendous...
#include "thread_pool.hpp"
#include <ctime>
#include <iostream>
#include <vector>
using namespace boost;
using namespace std;
struct T {
vector<int> nums = {};
};
typedef boost::shared_ptr<T> Tptr;
typedef vector<Tptr> TptrVector;
void create_T(const int i, TptrVector& v) {
v[i] = Tptr(new T());
T& t = *v[i].get();
for (int i = 0; i < 100; i++) {
t.nums.push_back(i);
}
}
int main(int argc, char* argv[]) {
clock_t begin, end;
double elapsed;
// define and parse program options
if (argc != 3) {
cout << argv[0] << " <num iterations> <num threads>" << endl;
return 1;
}
int iterations = stoi(argv[1]),
threads = stoi(argv[2]);
// create thread pool
ThreadPool tp(threads);
// non-threaded
cout << "non-thread" << endl;
begin = clock();
TptrVector v(iterations);
for (int i = 0; i < iterations; i++) {
create_T(i, v);
}
end = clock();
elapsed = double(end - begin) / CLOCKS_PER_SEC;
cout << elapsed << " seconds" << endl;
// threaded
cout << "threaded" << endl;
begin = clock();
TptrVector v2(iterations);
for (int i = 0; i < iterations; i++) {
tp.submit(boost::bind(create_T, i, v2));
}
tp.stop();
end = clock();
elapsed = double(end - begin) / CLOCKS_PER_SEC;
cout << elapsed << " seconds" << endl;
return 0;
}
After doing some digging, I think the poor performance may be due to the threads vying for memory access, but my newb status if keeping me from exploiting this insight. Can you efficiently populate the pointer vector using multiple threads, ideally in a thread pool?
Upvotes: 1
Views: 211
Reputation: 16256
you haven't provided neither enough details or a Minimal, Complete, and Verifiable example, so expect lots of guessing.
createT
is a "cheap" function. Scheduling a task and an overhead of its execution is much more expensive. It's why your performance is bad. To get a boost from parallelism you need to have proper work granularity and amount of work. Granularity means that each task (in your case one call to createT
) should be big enough to pay for multithreading overhead. The simplest approach would be to group createT
calls to get bigger tasks.
Upvotes: 4