one thread per block in cuda

Question

If I have a sequence (or stream) of 2D vectors and I want to use one thread for each 2D vector, can I assign one block for each vector and one thread per block? Must I convert it first to one-dimensional array? or the only thread can access the vector elements by the blockIdx.x and blockIdx.y?

and what will be the kernel launch parameters?

Assuming that vsize= number of the 2D vectors (which I want to use it as number of blocks)

is this will be correct:

mykernel<>()

The computations on each vector are independent. and my device compute capability is 2.1

Robert Crovella · Accepted Answer

Yes, you can launch one thread per block with CUDA. It's generally not how you get performance out of the machine, because it leaves ~97% of the execution resources idle as that one thread is running.

If you want to launch one thread per block, this is the correct syntax:

mykernel<<>>(...);

where gridsize is the number of blocks per grid you intend to launch. Using one thread per block is often used to introduce CUDA to new programmers, but generally should not be used for performance-oriented code.

Under the above circumstances, every block will launch with a single thread, and that thread will have thread indices (threadIdx.x, threadIdx.y, and threadIdx.z) which are all zero. The block indices (blockIdx.x etc.) will be determined by your gridsize variable.

one thread per block in cuda

Answers (1)

Related Questions