hexpheus
hexpheus

Reputation: 761

Wrong scan pseudocode by CUDA?

I'm trying to implemenet the pseudocode of the prefix-sum(scan) operation given in the CUDA documentation. The results I'm getting is absolutely wrong. I revised my code hundred times but still got problems with that. Here is the pseudocode given by CUDA:

1: for d = 1 to log2 n do
2:     for all k in parallel do
3:          if k >= power(2, d) then
4:              x[k] = x[k – power(2, d-1)] + x[k]

And the CUDA kernel I've coded so far is:

// CUDA Kernel
__global__ void
prefixSumCUDA(int *a, size_t n)
{

int tId = threadIdx.x;

for (int offset = 1; offset < n; offset *= 2) {
    if (tId >= pow((float)2, offset)) {
        int temp = tId - pow((float)2, offset - 1);
        a[tId] += a[temp];
    }
}
}

Please let me know if I am making any mistakes here. I know this implementation is massively dependent on the size of the blocks and grids. Thus, I will provide my kernel call here:

// Kernel launch
prefixSumCUDA << <1, 32 >> > (d_A, n);

The input array is a 8 element integer type:

[-] array: 1, 2, 3, 4, 5, 6, 7, 8

And the result of the CUDA kernel is as following:

[-] array: 1, 2, 5, 7, 14, 18, 22, 26

Thanks for any help in advance!

Upvotes: 1

Views: 46

Answers (1)

hexpheus
hexpheus

Reputation: 761

I solved the problem by implementing this another way. The offset is better to get started from 0 rather than 1. This results in the following code.

__global__ void
prefixSumCUDA(int *a, size_t n)
{

int tId = threadIdx.x;

int end = ceil(log2((float)n));

for (int offset = 0; offset < end; offset++) {
    if (tId >= (1 << offset)) {
        a[tId] += a[tId - (1 << offset)];
    }
}
}

Upvotes: 1

Related Questions