Iteration second cycle with sum reduction in CUDA

Question

I have to parallelize this code from c ++ to CUDA C

  for(ihist = 0; ihist < numhist; ihist++){ 
      for(iwin = 0; iwin



In CUDA C, using the sum reduction :

extern __shared__ float sdata[];
  int tx = threadIdx.x;
  int i=blockIdx.x;
  int j=blockIdx.y;
  float sum=0.0;
  float temp=0.0;
  temp=U[j];


   if(tx 0;offset >>= 1)
  {
   if(tx < offset)
   {
    // add a partial sum upstream to our own
    sdata[tx] += sdata[tx + offset];
   }
   __syncthreads();
  }

   // finally, thread 0 writes the result
  if(threadIdx.x == 0)
  {
   // note that the result is per-block
   // not per-thread
   denwham[i] = sdata[0];

    for(int k=0;k


And initialize it in this way:

 int smem_sz = (256)*sizeof(float);
  dim3 Block(numhist,numhist,1);
  NewProbabilitiesKernel<<>>(...);


My problem is that I cannot iterate over U using exp , I have tried the following:

a) loop for/while inside the kernel that iterates over U 
b) iterate by thread
c) iterate to block


All these attempts lead me to different results between C++ code and code cuda.The code works fine if instead of U [i] I put a constant!

have you any idea to help me ? 

thanks.

Iteration second cycle with sum reduction in CUDA

Answers (1)

Update

Related Questions