Vivek Mahto
Vivek Mahto

Reputation: 314

How can I find out which thread is getting executed on which core of the GPU?

I'm developing some simple programs in Cuda and i want to know which thread is getting executed on which core of the GPU. I'm using Visual Studio 2012 and i have a NVIDIA GeForce 610M graphic card.

Is it possible to do so... I've already searched a lot on google but all in vain.

EDIT :

I know this is really weird to ask but i have been asked to do that by my college project guide.

Upvotes: 3

Views: 2309

Answers (1)

Robert Crovella
Robert Crovella

Reputation: 151899

Combining information from the PTX manual and a simple inline-PTX wrapper, the following functions should give you what you need:

static __device__ __inline__ uint32_t __mysmid(){    
  uint32_t smid;    
  asm volatile("mov.u32 %0, %%smid;" : "=r"(smid));    
  return smid;}

the above function will tell you which multiprocessor the (thread) code is executing on.

static __device__ __inline__ uint32_t __mywarpid(){    
  uint32_t warpid;    
  asm volatile("mov.u32 %0, %%warpid;" : "=r"(warpid));    
  return warpid;}

the above function will tell you which warp the (thread) code belongs to.

static __device__ __inline__ uint32_t __mylaneid(){    
  uint32_t laneid;    
  asm volatile("mov.u32 %0, %%laneid;" : "=r"(laneid));    
  return laneid;}

the above function will tell you which warp lane the (thread) code belongs to.

Note that in the case of dynamic parallelism (and possibly other scenarios such as debugging), this information is volatile and may change during program execution.

Refer to the programming guide for definition of terms like multiprocessor and warp.

Here is a fully-worked example:

$ cat t646.cu
#include <stdio.h>
#include <stdint.h>

static __device__ __inline__ uint32_t __mysmid(){
  uint32_t smid;
  asm volatile("mov.u32 %0, %%smid;" : "=r"(smid));
  return smid;}

static __device__ __inline__ uint32_t __mywarpid(){
  uint32_t warpid;
  asm volatile("mov.u32 %0, %%warpid;" : "=r"(warpid));
  return warpid;}

static __device__ __inline__ uint32_t __mylaneid(){
  uint32_t laneid;
  asm volatile("mov.u32 %0, %%laneid;" : "=r"(laneid));
  return laneid;}


__global__ void mykernel(){

  int idx = threadIdx.x+blockDim.x*blockIdx.x;
  printf("I am thread %d, my SM ID is %d, my warp ID is %d, and my warp lane is %d\n", idx, __mysmid(), __mywarpid(), __mylaneid());
}

int main(){

  mykernel<<<4,4>>>();
  cudaDeviceSynchronize();
  return 0;
}
$ nvcc -arch=sm_20 -o t646 t646.cu
$ ./t646
I am thread 0, my SM ID is 0, my warp ID is 0, and my warp lane is 0
I am thread 1, my SM ID is 0, my warp ID is 0, and my warp lane is 1
I am thread 2, my SM ID is 0, my warp ID is 0, and my warp lane is 2
I am thread 3, my SM ID is 0, my warp ID is 0, and my warp lane is 3
I am thread 8, my SM ID is 3, my warp ID is 0, and my warp lane is 0
I am thread 9, my SM ID is 3, my warp ID is 0, and my warp lane is 1
I am thread 10, my SM ID is 3, my warp ID is 0, and my warp lane is 2
I am thread 11, my SM ID is 3, my warp ID is 0, and my warp lane is 3
I am thread 12, my SM ID is 4, my warp ID is 0, and my warp lane is 0
I am thread 13, my SM ID is 4, my warp ID is 0, and my warp lane is 1
I am thread 14, my SM ID is 4, my warp ID is 0, and my warp lane is 2
I am thread 15, my SM ID is 4, my warp ID is 0, and my warp lane is 3
I am thread 4, my SM ID is 1, my warp ID is 0, and my warp lane is 0
I am thread 5, my SM ID is 1, my warp ID is 0, and my warp lane is 1
I am thread 6, my SM ID is 1, my warp ID is 0, and my warp lane is 2
I am thread 7, my SM ID is 1, my warp ID is 0, and my warp lane is 3
$

Note that the above output will vary depending on what kind of GPU you are running on. Don't expect your output to be exactly like the above.

Upvotes: 11

Related Questions