Reputation: 117
I was testing CUDA occupancy device, on purpose I tried one block having one thread. the spreadsheet gave me
Active Threads per Multiprocessor:32
Active Warps per Multiprocessor:1
I understand why the number of warps
is 1 but was expecting 1 as the number of active threads
per SM
. does this mean, a warp will be generated where 31 results won't be uncommitted. I doubt it is the case but want to confirm this.
Cheers
Upvotes: 2
Views: 375
Reputation: 117
I created a simple test program where i declared 32 words long array. the kernel code is simple d_a[tid]=2*[d_tid];I launched the kernel with on thread only. In displaying the result, i got d_a[0] only scaled correctly...the access to the other elements of my array displayed an error. which tells me that one warp was scheduled indeed but it had ONLY one thread active and not 32 hence my question and my confusion
Upvotes: 0
Reputation: 152164
The basic unit of scheduling in today's GPUs is the warp, not the thread. Therefore it does not matter whether you specify only one thread, or all 32, the warp is consumed for scheduling purposes in the same way.
In this case, I would say "Active Threads" is referring to all threads that are associated with Active Warps. Some of those threads may be doing nothing depending on your block configuration and/or actual thread code, but nevertheless those threads are involved in the scheduled warps.
Yes, if you want to run even just one thread, it requires an entire warp.
This is one reason why grid configurations that have a 1 in either position:
my_kernel<<<N, 1>>>();
or
my_kernel<<<1,N>>>();
are going to be inefficient in their use of GPU resources.
Upvotes: 2