Reputation: 1750
Ok so I have this huge array lets call it J
Now for each element of J there's an associated array TJ but the length of TJ is variable with respect to J
So for example the secuencial procedure will look something like this
for(J=0;J<length(ARRAY_J))
do
for(T=0;T<length(ARRAY_TJ))
do
ARRAY_RESULT[J]+=ARRAY_J[J]+ARRAY_TJ[T]
end
end
So I figured that if I arrange my threads in 2D blocks I can use the x index of the thread for J and the y index of the thread for T
Now I know the length of J but the length of T varies so I don't know how to define this in Cuda.
For example
ARRAY_RESULT[blockidx.y*blockDim.y+threadidx.y]+=ARRAY_J[blockidx.y*blockDim.y+threadidx.y]+ARRAY_TJ[blockidx.x*blockDim.x+threadidx.x]
So how could I define the dimensions of the block here considering the length of ARRAY_TJ is variable? should I use the maximun ARRAY_TJ in length? But then would a code like the one above work? for each value of ARRAY_J will it sum length(ARRAY_TJ) values?
Upvotes: 0
Views: 63
Reputation: 1599
I think it should be better to use 1D blocks, with length of J threads, and in each thread do
int thread = blockIdx.x * blockDim.x + threadIdx.x;
for(T=0;T<length(ARRAY_TJ))
ARRAY_RESULT[thread]+=ARRAY_J[thread]+ARRAY_TJ[T]
If you try to do it in 2D with the second dimension for the TJ array, more than one thread will be writing to the same position of ARRAY_RESULT at the same time (with the problems it carries) and there is no easy management of critical sections in cuda.
Upvotes: 1