user3226109
user3226109

Reputation: 13

How to calculate histogram using shared memory

I want to calculate a histogram using shared memory, and I write the code like below, but I found the result of CUDA and CPU is different, and the result of CUDA is blockDim.x times the result of the CPU, why?
And how can I fix it? thank you!

__global__ void CalHistKernel(int*imgData,int*bins,int datasize)
{
    __shared__ int _bins[3];

    int tx=threadIdx.x;
    int idx=blockIdx.x*blockDim.x+threadIdx.x;//blockDim.y=1
    if(tx<3)
    {
        _bins[tx]=0;       
    }
    __syncthreads();

    if(idx<datasize)
    {       
        atomicAdd((int*)&_bins[imgData[idx]],1);     
    }
    __syncthreads();
    for(int i=0;i<3;i++)   
        atomicAdd((int*)&bins[i],_bins[i]);
}

Upvotes: 1

Views: 1750

Answers (1)

Roger Dahl
Roger Dahl

Reputation: 15734

Looks like it's due due to the final loop, where you add the values in each shared memory bin to the global memory bins. Remember that the loop is executed in each thread.

Upvotes: 3

Related Questions