alan
alan

Reputation: 3534

Boost: Creating objects and populating a vector with threads

Using this boost asio based thread pool, in this case the class is named ThreadPool, I want to parallelize the population of a vector of type std::vector<boost::shared_ptr<T>>, where T is a struct containing a vector of type std::vector<int> whose content and size are dynamically determined after struct initialization.

Unfortunately, I am a newb at both c++ and multi threading, so my attempts at solving this problem have failed spectacularly. Here's an overly simplified sample program that times the non-threaded and threaded versions of the tasks. The threaded version's performance is horrendous...

#include "thread_pool.hpp"
#include <ctime>
#include <iostream>
#include <vector>


using namespace boost;
using namespace std;


struct T {
  vector<int> nums = {};
};


typedef boost::shared_ptr<T> Tptr;
typedef vector<Tptr> TptrVector;


void create_T(const int i, TptrVector& v) {
  v[i] = Tptr(new T());
  T& t = *v[i].get();
  for (int i = 0; i < 100; i++) {
    t.nums.push_back(i);
  }
}


int main(int argc, char* argv[]) {
  clock_t begin, end;
  double elapsed;

  // define and parse program options

  if (argc != 3) {
    cout << argv[0] << " <num iterations> <num threads>" << endl;
    return 1;
  }
  int iterations = stoi(argv[1]),
      threads    = stoi(argv[2]);

  // create thread pool
  ThreadPool tp(threads);

  // non-threaded
  cout << "non-thread" << endl;
  begin = clock();

  TptrVector v(iterations);
  for (int i = 0; i < iterations; i++) {
    create_T(i, v);
  }

  end = clock();
  elapsed = double(end - begin) / CLOCKS_PER_SEC;
  cout << elapsed << " seconds" << endl;

  // threaded
  cout << "threaded" << endl;
  begin = clock();

  TptrVector v2(iterations);
  for (int i = 0; i < iterations; i++) {
    tp.submit(boost::bind(create_T, i, v2));
  }
  tp.stop();

  end = clock();
  elapsed = double(end - begin) / CLOCKS_PER_SEC;
  cout << elapsed << " seconds" << endl;

  return 0;
}

After doing some digging, I think the poor performance may be due to the threads vying for memory access, but my newb status if keeping me from exploiting this insight. Can you efficiently populate the pointer vector using multiple threads, ideally in a thread pool?

Upvotes: 1

Views: 211

Answers (1)

Andriy Tylychko
Andriy Tylychko

Reputation: 16256

you haven't provided neither enough details or a Minimal, Complete, and Verifiable example, so expect lots of guessing.

createT is a "cheap" function. Scheduling a task and an overhead of its execution is much more expensive. It's why your performance is bad. To get a boost from parallelism you need to have proper work granularity and amount of work. Granularity means that each task (in your case one call to createT) should be big enough to pay for multithreading overhead. The simplest approach would be to group createT calls to get bigger tasks.

Upvotes: 4

Related Questions