Reputation: 344
I am invoking a CUDA kernel with only one block and only one thread inside this block, e.g.
kernel<<<1, 1>>>
Will this kernel be executed only on a single CUDA core as specified? So for instance if the GPU has 128 cores, only 1 of the 128 will be working?
thanks a lot!
Upvotes: 3
Views: 2983
Reputation: 72349
No. CUDA is an SIMD style architecture and the basic execution unit is a warp -- a grouping of 32 threads which are executed lock step wise on the hardware. If you launch a single block containing a single thread, the hardware will be executing a single warp of 32 threads, 31 of which are masked out and execute the equivalent of a stream of noops. Any given warp is executed on a single streaming multiprocessor, and depending on the generation of hardware you are using, that might involve 8, 16 or 32 cores of the SM on which it runs.
Upvotes: 8
Reputation: 5430
Each CUDA core is a lane in SM's SIMD. Your kernel activates only one SM and utilizes one of the lanes. So the kernel<<<1,1>>> is very inefficient, utilizing only one lane of one SM.
Upvotes: 2