user570593
user570593

Reputation: 3520

Histogram calculations in cuda

The code does not work. But when I comment atomicAdd in the following code, the code works.

What is the reason for that?
Where can I get histogram code for float array?

   __global__ void calculateHistogram(float *devD, int* retHis)
   {
      int globalLi = getCurrentThread(); //get the thread ID
      if(globalLi>=0 && globalLi<Rd*Cd*Dd)
      {
          int r=0,c=0,d=0;
          GetInd2Sub(globalLi, Rd, Cd, r, c, d); //some calculations to get r,c,d
          if(r>=stYd && r<edYd && c>=stXd && c<edXd && d>=stZd && d<edZd)
          {
            //calculate the histogram
            int indexInHis = GetBinNo(devD[globalLi]);  //get the bin number in the histogram
            atomicAdd(&retHis[indexInHis],1); //when I comment this line the code works
          }
       }
    }

Upvotes: 0

Views: 1942

Answers (1)

Nathan Whitehead
Nathan Whitehead

Reputation: 2012

Take a look at chapter 9 of CUDA by Example by Jason Sanders and Edward Kandrot. It covers atomics and goes through a simple example computing histograms of 8-bit integers. The first version uses an atomic add for each value, which works but is very slow. The refined version of the example computes a histogram for each block in shared memory, then merges all the histograms together into global memory to get the final result. Your code is like the first version, once you get it working you will want to make it more like the fast refined version.

You can download the examples from the book to see both versions: CUDA by Example downloads

You don't appear to give complete code or error messages, so I can't say exactly what is going wrong in your code. Here are some thoughts:

  • You need to compile with an architecture that supports atomics (i.e. greater than the default 1.0 architecture target)
  • The indexing and index limits appear somewhat complicated, I would double-check those
  • Your bin calculation might be giving bin numbers outside a valid range for retHis, I would add some checks before using the return value, at least for debugging

Upvotes: 3

Related Questions