Reputation: 43
I just wonder how to convert the following openMP program to a openCL program.
The parallel section of algorithm implemented using openMP looks like this:
#pragma omp parallel
{
int thread_id = omp_get_thread_num();
//double mt_probThreshold = mt_nProbThreshold_;
double mt_probThreshold = nProbThreshold;
int mt_nMaxCandidate = mt_nMaxCandidate_;
double mt_nMinProb = mt_nMinProb_;
int has_next = 1;
std::list<ScrBox3d> mt_detected;
ScrBox3d sample;
while(has_next) {
#pragma omp critical
{ // '{' is very important and define the block of code that needs lock.
// Don't remove this pair of '{' and '}'.
if(piter_ == box_.end()) {
has_next = 0;
} else{
sample = *piter_;
++piter_;
}
} // '}' is very important and define the block of code that needs lock.
if(has_next){
this->SetSample(&sample, thread_id);
//UpdateSample(sample, thread_id); // May be necesssary for more sophisticated features
sample._prob = (float)this->Prob( true, thread_id, mt_probThreshold);
//sample._prob = (float)_clf->LogLikelihood( thread_id);
InsertCandidate( mt_detected, sample, mt_probThreshold, mt_nMaxCandidate, mt_nMinProb );
}
}
#pragma omp critical
{ // '{' is very important and define the block of code that needs lock.
// Don't remove this pair of '{' and '}'.
if(mt_detected_.size()==0) {
mt_detected_ = mt_detected;
//mt_nProbThreshold_ = mt_probThreshold;
nProbThreshold = mt_probThreshold;
} else {
for(std::list<ScrBox3d>::iterator it = mt_detected.begin();
it!=mt_detected.end(); ++it)
InsertCandidate( mt_detected_, *it, /*mt_nProbThreshold_*/nProbThreshold,
mt_nMaxCandidate_, mt_nMinProb_ );
}
} // '}' is very important and define the block of code that needs lock.
}//parallel section end
My question is: can this section be implemented with openCL? I followed a series of openCL tutorials, and I understood the manner of work, I was writing the code in .cu files, (I previously installed CUDA toolkit) but in this case the situation is more complicated, because there are used a lot of header files, template classes and object-oriented-programming were used.
How could I convert this section implemented in openMP to openCL? Should I create a new .cu file?
Any advice could help. Thanks in advance.
Using VS profiler I noticed that the most execution time is spent on InsertCandidate() function, I'm thinking about writing a kernel to execute this function on GPU. The most expensive operation of this function is a for
instruction. But as it can be seen, each for cycle contains 3 if
instructions, and this can lead to divergence, resulting in serialization, even if executed on GPU.
for( iter = detected.begin(); iter != detected.end(); iter++ )
{
if( nCandidate == nMaxCandidate-1 )
nProbThreshold = iter->_prob;
if( box._prob >= iter->_prob )
break;
if( nCandidate >= nMaxCandidate && box._prob <= nMinProb )
break;
nCandidate ++;
}
As a conclusion, can this program be converted to openCL?
Upvotes: 3
Views: 725
Reputation: 5087
It may be possible to convert your sample code to opencl, however I spotted a couple of issues with doing so.
If the function is large enough, you may be able to port the calls to this->Prob(...) instead. You need to be able to cache up a bunch of calls' by storing the parameters in a suitable data structure. By 'a bunch' I mean at least hundreds but ideally thousands or more. Again, this is only worth it if this->Prob() is constant for all calls, and complex enough to be worth the round-trip to the opencl device and back.
Upvotes: 2