Reputation: 12273
When I pass the grid size and thread count at kernel call, will these values be placed in gridDim and blockDim always and exactly as I passed them?
And, moreover, will blockIdx and threadIdx always respect these limits?
In other words, calling
kernel<<<5, 7>>>()
always results in having, in the kernel,
gridDim.x == 5 && blockIdx.x < gridDim.x
blockDim.x == 7 && threadIdx.x < blockDim.x
the above conditions to hold? (And equally with 2D and 3D sizes and index?)
I know this may sound silly as question, but I'm wondering if CUDA is allowed to ignore this limits for resource allocation and thus the programmer is always required to check.
Hoping it's clear, thanks!
Upvotes: 2
Views: 3355
Reputation: 3509
Yes, if you start your kernel in the dimension <<<5,7>>> it will have 5 blocks and 7 threads per block. Note that you are most efficient if you operate in the bounds of your GPU. You should use the warpsize as read from the device properties to get the maximum speed out of your card. Use a lot of threads as needed, but the blocks should be a multiple of your warpsize.
CUDA itself does not change your threads / blocks to another size, so you should be fine with the addressing.
Upvotes: 4