Atirag
Atirag

Reputation: 1750

Processing mutiple sized arrays with Cuda

Ok so I have this huge array lets call it J

Now for each element of J there's an associated array TJ but the length of TJ is variable with respect to J

So for example the secuencial procedure will look something like this

for(J=0;J<length(ARRAY_J))
do
  for(T=0;T<length(ARRAY_TJ))
  do
    ARRAY_RESULT[J]+=ARRAY_J[J]+ARRAY_TJ[T]
  end
end

So I figured that if I arrange my threads in 2D blocks I can use the x index of the thread for J and the y index of the thread for T

Now I know the length of J but the length of T varies so I don't know how to define this in Cuda.

For example

ARRAY_RESULT[blockidx.y*blockDim.y+threadidx.y]+=ARRAY_J[blockidx.y*blockDim.y+threadidx.y]+ARRAY_TJ[blockidx.x*blockDim.x+threadidx.x]

So how could I define the dimensions of the block here considering the length of ARRAY_TJ is variable? should I use the maximun ARRAY_TJ in length? But then would a code like the one above work? for each value of ARRAY_J will it sum length(ARRAY_TJ) values?

Upvotes: 0

Views: 63

Answers (1)

Evans
Evans

Reputation: 1599

I think it should be better to use 1D blocks, with length of J threads, and in each thread do

int thread = blockIdx.x * blockDim.x + threadIdx.x;
for(T=0;T<length(ARRAY_TJ))
    ARRAY_RESULT[thread]+=ARRAY_J[thread]+ARRAY_TJ[T]

If you try to do it in 2D with the second dimension for the TJ array, more than one thread will be writing to the same position of ARRAY_RESULT at the same time (with the problems it carries) and there is no easy management of critical sections in cuda.

Upvotes: 1

Related Questions