FundamentalAxiom
FundamentalAxiom

Reputation: 43

Would there be any performance difference for CUDA blocksize 1024x1 vs 32x32?

How are these two block sizes (1024x1 vs 32x32) expected to perform from thread scheduling and memory bandwidth perspective? Is there any expected difference in performance of these 2 block sizes? Note that both use 1024 threads per block.

Upvotes: 4

Views: 164

Answers (1)

Robert Crovella
Robert Crovella

Reputation: 152249

Threadblock dimensions, especially when we are talking about the same number of threads per block, don't by themselves affect performance.

Threads are still grouped for execution into warps. The only direct effect of threadblock dimensions is to change the built-in variables e.g. threadIdx.x, blockIdx.x, etc. that are passed to each thread, which is not a performance issue.

Upvotes: 3

Related Questions