RJay khadka
RJay khadka

Reputation: 9

A kernel with less thread divergence

Expected value of result = 8. Received value of result= 1; Can pin point what is wrong on this? Result should have the value of 8 but it is printing out the value of 1. Can anyone help?

#include <stdio.h>`
#include <assert.h>
//define array size 8
#define ARRAY_SIZE 8
__global__ void vecAddKernel(int * A_d) {
//thread Index
unsigned int t = threadIdx.x;
    for (unsigned int stride = blockDim.x / 2; stride > 0; stride /= 2) {
    __syncthreads();
    if (t < stride)
        A_d[t] += A_d[t + stride];
    }
}

int main(int argc, char * * argv) {
    int A_h[ARRAY_SIZE];
   // initializing all values in A_h array to 1
    for (int i = 0; i < ARRAY_SIZE; i++) {
        A_h[i] = 1;
    }
    int * A_d, result;
   // reserving size array A_d of 8 in cuda
    cudaMalloc((void * * ) & A_d, ARRAY_SIZE * sizeof(int));

    cudaMemcpy(A_d, A_h, ARRAY_SIZE * sizeof(int), cudaMemcpyHostToDevice);

    vecAddKernel << < 1, ARRAY_SIZE / 2 >>> (A_d);
   Copy the first index of A_d to the result.
    cudaMemcpy( &result, &A_d[0], sizeof(int), cudaMemcpyDeviceToHost);
  // outputting the value of result
    printf("Result = %d\n", result);
    //freeing the memory
    cudaFree(A_d);
}

Upvotes: 0

Views: 67

Answers (1)

Thomas Foster
Thomas Foster

Reputation: 321

I'm not sure how you're getting Result = 1.

When I compile and run your code, I see Result = 4. That's because the initial value of stride in the loop inside the kernel should be blockDim.x rather than blockDim.x / 2 (the first iteration of the loop should add pairs of values separated by ARRAY_SIZE / 2, and blockDim.x is already ARRAY_SIZE / 2).

Replacing blockDim.x / 2 with blockDim.x in the initializer of unsigned int stride renders the program correct.

If you're interested in performing array reductions like this, you might want to look at __shfl_down and the other shuffle functions introduced with Kepler: https://devblogs.nvidia.com/parallelforall/faster-parallel-reductions-kepler/

Upvotes: 1

Related Questions