AkiRoss
AkiRoss

Reputation: 12273

CUDA gridDim, blockDim are always user defined?

When I pass the grid size and thread count at kernel call, will these values be placed in gridDim and blockDim always and exactly as I passed them?

And, moreover, will blockIdx and threadIdx always respect these limits?

In other words, calling

kernel<<<5, 7>>>()

always results in having, in the kernel,

gridDim.x == 5 && blockIdx.x < gridDim.x
blockDim.x == 7 && threadIdx.x < blockDim.x

the above conditions to hold? (And equally with 2D and 3D sizes and index?)

I know this may sound silly as question, but I'm wondering if CUDA is allowed to ignore this limits for resource allocation and thus the programmer is always required to check.

Hoping it's clear, thanks!

Upvotes: 2

Views: 3355

Answers (1)

SinisterMJ
SinisterMJ

Reputation: 3509

Yes, if you start your kernel in the dimension <<<5,7>>> it will have 5 blocks and 7 threads per block. Note that you are most efficient if you operate in the bounds of your GPU. You should use the warpsize as read from the device properties to get the maximum speed out of your card. Use a lot of threads as needed, but the blocks should be a multiple of your warpsize.

CUDA itself does not change your threads / blocks to another size, so you should be fine with the addressing.

Upvotes: 4

Related Questions