Processing mutiple sized arrays with Cuda

Question

Ok so I have this huge array lets call it J

Now for each element of J there's an associated array TJ but the length of TJ is variable with respect to J

So for example the secuencial procedure will look something like this

for(J=0;J



So I figured that if I arrange my threads in 2D blocks I can use the x index of the thread for J and the y index of the thread for T

Now I know the length of J but the length of T varies so I don't know how to define this in Cuda.

For example

ARRAY_RESULT[blockidx.y*blockDim.y+threadidx.y]+=ARRAY_J[blockidx.y*blockDim.y+threadidx.y]+ARRAY_TJ[blockidx.x*blockDim.x+threadidx.x]


So how could I define the dimensions of the block here considering the length of ARRAY_TJ is variable? should I use the maximun ARRAY_TJ in length? But then would a code like the one above work? for each value of ARRAY_J will it sum length(ARRAY_TJ) values?

Evans · Accepted Answer

I think it should be better to use 1D blocks, with length of J threads, and in each thread do

int thread = blockIdx.x * blockDim.x + threadIdx.x;
for(T=0;T



If you try to do it in 2D with the second dimension for the TJ array, more than one thread will be writing to the same position of ARRAY_RESULT at the same time (with the problems it carries) and there is no easy management of critical sections in cuda.

Processing mutiple sized arrays with Cuda

Answers (1)

Related Questions