zzzzz
zzzzz

Reputation: 1239

Whats wrong with my open cl kernel?

The code is multiplication of a sparse matrix stored in compressed column storage with a column vector.First is serial code.Second is open cl kernel.let me use more meaningful names instead of inputimage and output.

I had to parallelize my code.The serial output is different than the kernel's output.Can somebody please tell me anything I am missing?

The serial code was

int result[4]={0,0,0,0};
   for(int col=0;col<4;col++)
     {
         for(int j=rowptr[col];j<rowptr[col+1];j++)
         {

         result[index[j]]+=val[j]*colvector[col];

         }
     }

Its output is different from the parallel code.The work units per kernel is set to 4 The parallel code is given below.Somebody please tell me what i am missing.

      int col=get_global_id(0);

  for(int j=rowptr[col];j<rowptr[col+1];j++)
         {

         result[index[j]]+=val[j]*colvector[col];

         }

Upvotes: 1

Views: 266

Answers (1)

Eric Bainville
Eric Bainville

Reputation: 9906

In OpenCL, the 4 work items will be executed in parallel. They all will try to update output2[inputImage4[j]] at the same time. The behavior is undefined, but what you will probably observe is the contribution of only one of the 4 work items.

Solving this kind of issues requires a modification of the algorithm, or the use of atomic operations (which will serialize the updates) if you don't access the value too often.

Upvotes: 3

Related Questions