Reputation: 25
void calc_mean(float *left_mean, float *right_mean, const uint8_t* left, const uint8_t* right, int32_t block_width, int32_t block_height, int32_t d, uint32_t w, uint32_t h, int32_t i,int32_t j)
{
*left_mean = 0;
*right_mean = 0;
int32_t i_b;
float local_left = 0, local_right = 0;
for (i_b = -(block_height-1)/2; i_b < (block_height-1)/2; i_b++) {
#pragma omp parallel for reduction(+:local_left,local_right)
for ( int32_t j_b = -(block_width-1)/2; j_b < (block_width-1)/2; j_b++) {
// Borders checking
if (!(i+i_b >= 0) || !(i+i_b < h) || !(j+j_b >= 0) || !(j+j_b < w) || !(j+j_b-d >= 0) || !(j+j_b-d < w)) {
continue;
}
// Calculating indices of the block within the whole image
int32_t ind_l = (i+i_b)*w + (j+j_b);
int32_t ind_r = (i+i_b)*w + (j+j_b-d);
// Updating the block means
//*left_mean += *(left+ind_l);
//*right_mean += *(right+ind_r);
local_left += left[ind_l];
local_right += right[ind_r];
}
}
*left_mean = local_left/(block_height * block_width);
*right_mean = local_right/(block_height * block_width);
}
This now makes the program execution longer than non-threaded version. I added private(left,right) but it leads to bad memory access for ind_l.
Upvotes: 1
Views: 72
Reputation: 808
I think this should get you closer to what you want, although I'm not quite sure about one final part.
float local_left, local_right = 0;
for ( int32_t i_b = -(block_height-1)/2; i_b < (block_height-1)/2; i_b++) {
#pragma omp for schedule(static, CORES) reduction(+:left_mean, +: right_mean)
{
for ( int32_t j_b = -(block_width-1)/2; j_b < (block_width-1)/2; j_b++) {
if (your conditions) continue;
int32_t ind_l = (i+i_b)*w + (j+j_b);
int32_t ind_r = (i+i_b)*w + (j+j_b-d);
local_left += *(left+ind_l);
local_right += *(right+ind_r);
}
}
}
*left_mean = local_left/(block_height * block_width);
*right_mean = local_right/(block_height * block_width);
Part I am unsure of is whether you need the schedule() and how to do two different reductions. I know for one reduction, you can simply do
reduction(+:left_mean)
EDIT: some reference for the schedule() http://pages.tacc.utexas.edu/~eijkhout/pcse/html/omp-loop.html#Loopschedules It looks like you do not need this, but using it could produce a better runtime
Upvotes: 2