icb23
icb23

Reputation: 43

Converting parallel program from openMP to openCL

I just wonder how to convert the following openMP program to a openCL program.

The parallel section of algorithm implemented using openMP looks like this:

#pragma omp parallel
  {
    int thread_id = omp_get_thread_num();

    //double mt_probThreshold = mt_nProbThreshold_;
    double mt_probThreshold = nProbThreshold;

    int mt_nMaxCandidate = mt_nMaxCandidate_;
    double mt_nMinProb = mt_nMinProb_;

    int has_next = 1;
    std::list<ScrBox3d> mt_detected;
    ScrBox3d  sample;
    while(has_next) {
#pragma omp critical
    {  // '{' is very important and define the block of code that needs lock.
      // Don't remove this pair of '{' and '}'.
      if(piter_ == box_.end()) {
        has_next = 0;
      } else{
        sample = *piter_;
        ++piter_;
      }
    }  // '}' is very important and define the block of code that needs lock.

    if(has_next){
      this->SetSample(&sample, thread_id);
      //UpdateSample(sample, thread_id); // May be necesssary for more sophisticated features
      sample._prob = (float)this->Prob( true, thread_id, mt_probThreshold);
      //sample._prob = (float)_clf->LogLikelihood( thread_id);
      InsertCandidate( mt_detected, sample, mt_probThreshold, mt_nMaxCandidate, mt_nMinProb );
    }
  }

#pragma omp critical
  {  // '{' is very important and define the block of code that needs lock.
    // Don't remove this pair of '{' and '}'.
    if(mt_detected_.size()==0) {
      mt_detected_    = mt_detected;
      //mt_nProbThreshold_  = mt_probThreshold;
      nProbThreshold = mt_probThreshold;
    } else {
      for(std::list<ScrBox3d>::iterator it = mt_detected.begin(); 
          it!=mt_detected.end(); ++it)
        InsertCandidate( mt_detected_, *it, /*mt_nProbThreshold_*/nProbThreshold, 
        mt_nMaxCandidate_, mt_nMinProb_ );
      }
    }  // '}' is very important and define the block of code that needs lock.
  }//parallel section end

My question is: can this section be implemented with openCL? I followed a series of openCL tutorials, and I understood the manner of work, I was writing the code in .cu files, (I previously installed CUDA toolkit) but in this case the situation is more complicated, because there are used a lot of header files, template classes and object-oriented-programming were used.

How could I convert this section implemented in openMP to openCL? Should I create a new .cu file?

Any advice could help. Thanks in advance.

Edit:

Using VS profiler I noticed that the most execution time is spent on InsertCandidate() function, I'm thinking about writing a kernel to execute this function on GPU. The most expensive operation of this function is a for instruction. But as it can be seen, each for cycle contains 3 if instructions, and this can lead to divergence, resulting in serialization, even if executed on GPU.

for( iter = detected.begin(); iter != detected.end(); iter++ )
    {
        if( nCandidate == nMaxCandidate-1 )
            nProbThreshold = iter->_prob;

        if( box._prob >= iter->_prob )
            break;
        if( nCandidate >= nMaxCandidate && box._prob <= nMinProb )
            break;
        nCandidate ++;
    }

As a conclusion, can this program be converted to openCL?

Upvotes: 3

Views: 725

Answers (1)

mfa
mfa

Reputation: 5087

It may be possible to convert your sample code to opencl, however I spotted a couple of issues with doing so.

  1. There doesn't seem to be much parallel execution to begin with. More workers may not help at all.
  2. Adding work to process during execution is a fairly recent feature in opencl. You would have to either use opencl 2.0, or know in advance how much work will be added, and pre-allocate memory to store the new data structures. The calls to InsertCandidate may be the part which "can't" be converted to opencl.

If the function is large enough, you may be able to port the calls to this->Prob(...) instead. You need to be able to cache up a bunch of calls' by storing the parameters in a suitable data structure. By 'a bunch' I mean at least hundreds but ideally thousands or more. Again, this is only worth it if this->Prob() is constant for all calls, and complex enough to be worth the round-trip to the opencl device and back.

Upvotes: 2

Related Questions