kostaspap
kostaspap

Reputation: 344

A single thread on CUDA

I am invoking a CUDA kernel with only one block and only one thread inside this block, e.g.

kernel<<<1, 1>>>

Will this kernel be executed only on a single CUDA core as specified? So for instance if the GPU has 128 cores, only 1 of the 128 will be working?

thanks a lot!

Upvotes: 3

Views: 2983

Answers (2)

talonmies
talonmies

Reputation: 72349

No. CUDA is an SIMD style architecture and the basic execution unit is a warp -- a grouping of 32 threads which are executed lock step wise on the hardware. If you launch a single block containing a single thread, the hardware will be executing a single warp of 32 threads, 31 of which are masked out and execute the equivalent of a stream of noops. Any given warp is executed on a single streaming multiprocessor, and depending on the generation of hardware you are using, that might involve 8, 16 or 32 cores of the SM on which it runs.

Upvotes: 8

lashgar
lashgar

Reputation: 5430

Each CUDA core is a lane in SM's SIMD. Your kernel activates only one SM and utilizes one of the lanes. So the kernel<<<1,1>>> is very inefficient, utilizing only one lane of one SM.

Upvotes: 2

Related Questions