Breakpoints inside CUDA kernel __global__ not hitting

Question

Using visual studios 2010. Win 7. Nsight 2.1

#include "cuda.h"
#include "cuda_runtime.h"
#include "device_launch_parameters.h"

// incrementArray.cu
#include 
#include 

void incrementArrayOnHost(float *a, int N)
{
  int i;
  for (i=0; i < N; i++) a[i] = a[i]+1.f;
}
__global__ void incrementArrayOnDevice(float *a, int N)
{
  int idx = blockIdx.x*blockDim.x + threadIdx.x;

  int j = idx;
  int i = 2;

  i = i+j; //->breakpoint here

  if (idxbreakpoint here
}
int main(void)
{
  float *a_h, *b_h;           // pointers to host memory
  float *a_d;                 // pointer to device memory
  int i, N = 10;
  size_t size = N*sizeof(float);
  // allocate arrays on host
  a_h = (float *)malloc(size);
  b_h = (float *)malloc(size);
  // allocate array on device 
  cudaMalloc((void **) &a_d, size);
  // initialization of host data
  for (i=0; i>> (a_d, N);
  // Retrieve result from device and store in b_h
  cudaMemcpy(b_h, a_d, sizeof(float)*N, cudaMemcpyDeviceToHost);
  // check results
  for (i=0; i



I've tried inserting breakpoints as listed above inside my global void incrementArrayOnDevice(float *a, int N) but they're not hitting.

When I run debug (f5) in visual studios, I tried to step into incrementArrayOnDevice <<< nBlocks, blockSize >>> (a_d, N); but they would skip the entire kernel code section.

tried to add a watch on the variables i and j but there was an error "CXX0017: Error: symbol "i" not found."

Is this issue normal? Can someone please try on their pc and let me know if they can hit the breakpoints? If you can, what possible problem could mine be? Please help! :(

Programmer · Accepted Answer

Nsight debugging is different from VS debugging . You need to use Nsight debugging to hit the kernel breakpoints. However, for this you need 2 GPU cards. Do you have 2 cards in the first place? Please check

Breakpoints inside CUDA kernel global not hitting

Answers (2)

Related Questions