Kostis
Kostis

Reputation: 467

Checkerboard indexing in CUDA

So, here's the question. I want to do a computation in CUDA where I have a large 1D array (which represents a lattice), I partition it into subarrays of length #part, and I want each thread to do a couple of computations on each subarray.

More specifically, let's say that we have a number of threads, #threads, and a number of blocks, #blocks. The array is of size N = 2 * #part * #threads * #blocks. If we number the subarrays from 1 to 2*#blocks*#threads, we want to first use the #threads*#blocks threads to do computation on the subarrays with an even number and then the same number of threads to do computation on the subarrays with an odd number.

I thought that I could have a local index in each thread which would denote from where it's subarray would start.

So, I used the following index :

localIndex = #part * (2 * threadIdx.x + var) + 2 * #part  * #Nthreads * blockIdx.x;

var is either 1 or 0, depending on if we want to have the thread do computation on an subarray with an even or an odd number.

I've tried to run it and it seems that something goes wrong when I use more than one blocks. Have I done something wrong with the indexing?

Thanks.

Upvotes: 1

Views: 358

Answers (2)

djmj
djmj

Reputation: 5554

Why is it important that the threads collectively do first even, then the odd subarrays, since block and thread execution is not guaranteed to be in order there is no benefit?

Assuming you index only using x-dimension for your kernel dimension setup:

subArrayIndexEven = 2 * (blockIdx.x * blockDim.x + threadIdx.x) * part  
subArrayIndexOdd = subArrayIndexEven + part

Prove:

BLOCK_SIZE = 3
NUM_OF_BLOCKS = 2
PART = 4

N = 2 * 3 * 2 * 4 = 48

T(threadIdx.x, blockIdx.x)
T(0, 1) -> even = 2 * (1 * 3 + 0) * 4 = 24, odd = 28
T(1, 1) -> even = 2 * (1 * 3 + 1) * 4 = 32, odd = 36
T(2, 1) -> even = 2 * (1 * 3 + 2) * 4 = 40, odd = 44

Upvotes: 1

ely
ely

Reputation: 77494

idx = threads_per_block*blockIdx.x + threadIdx.x;
int my_even_offset, my_odd_offset, my_even_idx, my_odd_idx;

int my_offset = floor(float(idx)/float(num_part)); 
my_even_offset = 2*my_offset*num_part;
my_odd_offset = (2*my_offset+1)*num_part;

my_even_idx = idx + my_even_offset;
my_odd_idx  = idx + my_odd_offset;

//Do stuff with the indices.

Upvotes: 1

Related Questions