Reputation: 467
So, here's the question. I want to do a computation in CUDA where I have a large 1D array (which represents a lattice), I partition it into subarrays of length #part, and I want each thread to do a couple of computations on each subarray.
More specifically, let's say that we have a number of threads, #threads, and a number of blocks, #blocks. The array is of size N = 2 * #part * #threads * #blocks. If we number the subarrays from 1 to 2*#blocks*#threads, we want to first use the #threads*#blocks threads to do computation on the subarrays with an even number and then the same number of threads to do computation on the subarrays with an odd number.
I thought that I could have a local index in each thread which would denote from where it's subarray would start.
So, I used the following index :
localIndex = #part * (2 * threadIdx.x + var) + 2 * #part * #Nthreads * blockIdx.x;
var is either 1 or 0, depending on if we want to have the thread do computation on an subarray with an even or an odd number.
I've tried to run it and it seems that something goes wrong when I use more than one blocks. Have I done something wrong with the indexing?
Thanks.
Upvotes: 1
Views: 358
Reputation: 5554
Why is it important that the threads collectively do first even, then the odd subarrays, since block and thread execution is not guaranteed to be in order there is no benefit?
Assuming you index only using x-dimension for your kernel dimension setup:
subArrayIndexEven = 2 * (blockIdx.x * blockDim.x + threadIdx.x) * part
subArrayIndexOdd = subArrayIndexEven + part
Prove:
BLOCK_SIZE = 3
NUM_OF_BLOCKS = 2
PART = 4
N = 2 * 3 * 2 * 4 = 48
T(threadIdx.x, blockIdx.x)
T(0, 1) -> even = 2 * (1 * 3 + 0) * 4 = 24, odd = 28
T(1, 1) -> even = 2 * (1 * 3 + 1) * 4 = 32, odd = 36
T(2, 1) -> even = 2 * (1 * 3 + 2) * 4 = 40, odd = 44
Upvotes: 1
Reputation: 77494
idx = threads_per_block*blockIdx.x + threadIdx.x;
int my_even_offset, my_odd_offset, my_even_idx, my_odd_idx;
int my_offset = floor(float(idx)/float(num_part));
my_even_offset = 2*my_offset*num_part;
my_odd_offset = (2*my_offset+1)*num_part;
my_even_idx = idx + my_even_offset;
my_odd_idx = idx + my_odd_offset;
//Do stuff with the indices.
Upvotes: 1