Reputation: 1239
The code is multiplication of a sparse matrix stored in compressed column storage with a column vector.First is serial code.Second is open cl kernel.let me use more meaningful names instead of inputimage and output.
I had to parallelize my code.The serial output is different than the kernel's output.Can somebody please tell me anything I am missing?
The serial code was
int result[4]={0,0,0,0};
for(int col=0;col<4;col++)
{
for(int j=rowptr[col];j<rowptr[col+1];j++)
{
result[index[j]]+=val[j]*colvector[col];
}
}
Its output is different from the parallel code.The work units per kernel is set to 4 The parallel code is given below.Somebody please tell me what i am missing.
int col=get_global_id(0);
for(int j=rowptr[col];j<rowptr[col+1];j++)
{
result[index[j]]+=val[j]*colvector[col];
}
Upvotes: 1
Views: 266
Reputation: 9906
In OpenCL, the 4 work items will be executed in parallel. They all will try to update output2[inputImage4[j]]
at the same time. The behavior is undefined, but what you will probably observe is the contribution of only one of the 4 work items.
Solving this kind of issues requires a modification of the algorithm, or the use of atomic operations (which will serialize the updates) if you don't access the value too often.
Upvotes: 3