Reputation: 21
I wrote a programm that multiplies a vector by a matrix. The matrix has periodically repeated cells, so I use a temporary variable to sum vector elements before multiplication. The period is the same for adjacent rows. I create a separate temp variable for each thread. sizeof(InnerVector) == 400 and I don't want to allocate memory for it on every iterration (= 600 times).
Code looks something like this:
tempsSize = omp_get_max_threads();
InnerVector temps = new InnerVector[tempsSize];
for(int k = 0; k < tempsSize; k++)
InnerVector_init(temps[k]);
for(int jmin = 1, jmax = 2; jmax < matrixSize/2; jmin *= 2, jmax *= 2)
{
int period = getPeriod(jmax);
#pragma omp parallel
{
int threadNum = omp_get_thread_num();
// printf("\n threadNum = %i", threadNum);
#pragma omp for
for(int j = jmin; j < jmax; j++)
{
InnerVector_reset(temps[threadNum]);
for(int i = jmin; i < jmax; i++)
{
InnerMatrix cell = getCell(i, j);
if(temps[threadNum].IsZero)
for(int k = j; k < matrixSize; k += period)
InnerVector_add(temps[threadNum], temps[threadNum], v[k]);
InnerVector_add_mul(v_res[i], cell, temps[threadNum]);
}
}
}
}
The code looks to be correct but I get wrong result. In fact, I get different results for different runs... sometimes result is correct.
When I compile in debug mode the result is always correct. When I uncomment the row with "printf" the result is always correct.
p.s. I use Visual Studio 2010.
Upvotes: 2
Views: 1134
Reputation: 12774
I suspect there might be a data race in
InnerVector_add_mul(v_res[i], cell, temps[threadNum]);
Since v_res
appears to be a resulting vector, and i
changes from jmin
to jmax
in each iteration of the parallelized loop, it can happen that multiple threads write to v_res[i]
for the same value of i
, with unpredictable result.
Upvotes: 3