Reputation: 4663
__device__ int arr[1]{5};
__global__ void kernel()
{
uint32_t threadId = blockDim.x * blockIdx.x + threadIdx.x;
uint32_t mask = __activemask();
printf("id %u, mask %u, value %i\n", threadId, mask, arr[threadId]);
}
int main(int argc, char *[])
{
kernel<<<1, 1>>>();
gpuErrchk(cudaPeekAtLastError());
gpuErrchk(cudaDeviceSynchronize());
return 0;
}
In the code above a single thread is launched. The active mask shows that indeed only one thread is active. But because of the nature of NVidia's GPUs a complete warp executes the kernel (32 threads instead of 1). I'm imagining that each thread in a warp gets assigned some thread id and based on this id a thread will access a certain part of the memory.
How does the memory operations are performed for inactive threads? Do reads and writes actually get executed? If they are that can lead to reading/writing out of bounds, like in my example, where the size of the array is 1.
Upvotes: 1
Views: 277
Reputation: 152164
Inactive threads are masked in such a way that no actual work is done. Although it doesn't cover your case exactly an example of the handling is given here:
If threads of a warp diverge via a data-dependent conditional branch, the warp executes each branch path taken, disabling threads that are not on that path.
Disabling a thread means that the thread won't do anything that is program-meaningful.
We can get some coverage for your case with the additional description here:
There are two reasons threads within a warp can be disabled: being inactive, and being predicated off. If the block size is not a multiple of the warp size, the last warp in the block will have inactive threads.
So the "extra threads" in a block that doesn't have a multiple of 32 threads will be inactive, also.
Inactive threads don't do anything that is meaningful to your program state; they are disabled.
You don't need to be concerned about an inactive thread making a memory request that you didn't intend.
Likewise, even if it didn't involve memory, you don't need to be concerned about an inactive thread doing any other work, such as multiplying two registers and modifying another register.
Upvotes: 3