ev-br
ev-br

Reputation: 26070

openmp parallel for with non-PODs

I'm trying to speed up a program, in the heart of which is a trivially-looking loop:

double sum=0.;
#pragma omp parallel for reduction(+:sum) // fails
for( size_t i=0; i<_S.size(); ++i ){
  sum += _S[i].first* R(atoms,_S[i].second) ;
}

While the looping itself is trivial, the objects inside it are not PODs: Here _S is in fact an std::vector< std::pair<double, std::vector<size_t> > >, and R(...) is an overloaded operator(...) const of some object. Both of its arguments qualified as const, so that the call does not have any side effects.

Since some 90% of the runtime is spent in this call, it seemed a simple thing to throw in an OpenMP pragma as shown above, and enjoy a speedup by a factor of two or three; but of course --- the code works OK with a single thread, but gives plain wrong results for more than one thread :-).

There is no data dependency, both _S and R(...) seem to be safe to share between threads, but still it produces nonsense.

I'd really appreciate any pointers into how to find what goes wrong.

UPD2:

Figured it. As all bugs, it's trivial. The R(...) was calling the operator() of something of this sort:

class objR{
 public:
   objR(const size_t N){
     _buffer.reserve(N);
   };
   double operator(...) const{
     // do something, using the _buffer to store intermediaries
   }

 private:
   std::vector<double> _buffer;
};

Clearly, different threads use the _buffer at the same time and mess it up. My solution so far is to allocate more space (memory is not a problem, the code is CPU-bound):

class objR{
 public:
   objR(const size_t N){
     int nth=1;
     #ifdef _OPENMP
       nth=omp_get_max_threads();
     #endif
     _buffer.resize(N);
   }

   double operator(...) const{
     int thread_id=0;
     #ifdef _OPENMP
       thread_id = omp_get_thread_num();
     #endif
     // do something, using the _buffer[thread_id] to store intermediaries
   }

 private:
   std::vector< std::vector<double> > _buffer;
};

This seems to work correctly. Still, since it's my first foray into things mutithreaded, I'd appreciate if somebody knowledgeable could comment on if there is a better approach.

Upvotes: 0

Views: 258

Answers (1)

Puppy
Puppy

Reputation: 146968

The access to _S[i].first and _S[i].second is perfectly safe (can't guarantee anything about atoms). This means that your function call to R must be what's causing the problem. You need to find out what R is and post what it's doing.

In another point, names which begin with an underscore and begin with an uppercase character are reserved for the implementation and you invoke undefined behaviour by using them.

Upvotes: 1

Related Questions