f0rfun
f0rfun

Reputation: 757

Breakpoints inside CUDA kernel __global__ not hitting

Using visual studios 2010. Win 7. Nsight 2.1

#include "cuda.h"
#include "cuda_runtime.h"
#include "device_launch_parameters.h"

// incrementArray.cu
#include <stdio.h>
#include <assert.h>

void incrementArrayOnHost(float *a, int N)
{
  int i;
  for (i=0; i < N; i++) a[i] = a[i]+1.f;
}
__global__ void incrementArrayOnDevice(float *a, int N)
{
  int idx = blockIdx.x*blockDim.x + threadIdx.x;

  int j = idx;
  int i = 2;

  i = i+j; //->breakpoint here

  if (idx<N) a[idx] = a[idx]+1.f; //->breakpoint here
}
int main(void)
{
  float *a_h, *b_h;           // pointers to host memory
  float *a_d;                 // pointer to device memory
  int i, N = 10;
  size_t size = N*sizeof(float);
  // allocate arrays on host
  a_h = (float *)malloc(size);
  b_h = (float *)malloc(size);
  // allocate array on device 
  cudaMalloc((void **) &a_d, size);
  // initialization of host data
  for (i=0; i<N; i++) a_h[i] = (float)i;
  // copy data from host to device
  cudaMemcpy(a_d, a_h, sizeof(float)*N, cudaMemcpyHostToDevice);
  // do calculation on host
  incrementArrayOnHost(a_h, N);
  // do calculation on device:
  // Part 1 of 2. Compute execution configuration
  int blockSize = 4;
  int nBlocks = N/blockSize + (N%blockSize == 0?0:1);
  // Part 2 of 2. Call incrementArrayOnDevice kernel 
  incrementArrayOnDevice <<< nBlocks, blockSize >>> (a_d, N);
  // Retrieve result from device and store in b_h
  cudaMemcpy(b_h, a_d, sizeof(float)*N, cudaMemcpyDeviceToHost);
  // check results
  for (i=0; i<N; i++) assert(a_h[i] == b_h[i]);
  // cleanup
  free(a_h); free(b_h); cudaFree(a_d);

  return 0;
}

I've tried inserting breakpoints as listed above inside my global void incrementArrayOnDevice(float *a, int N) but they're not hitting.

When I run debug (f5) in visual studios, I tried to step into incrementArrayOnDevice <<< nBlocks, blockSize >>> (a_d, N); but they would skip the entire kernel code section.

tried to add a watch on the variables i and j but there was an error "CXX0017: Error: symbol "i" not found."

Is this issue normal? Can someone please try on their pc and let me know if they can hit the breakpoints? If you can, what possible problem could mine be? Please help! :(

Upvotes: 0

Views: 1269

Answers (2)

Antun Tun
Antun Tun

Reputation: 1539

You can debug on a single GPU but on the following conditions:

  1. You have to be using 5.0 toolkit
  2. You have to be programming on a GPU that suports 303.xx NForceWare or higher

Upvotes: 0

Programmer
Programmer

Reputation: 6753

Nsight debugging is different from VS debugging . You need to use Nsight debugging to hit the kernel breakpoints. However, for this you need 2 GPU cards. Do you have 2 cards in the first place? Please check

Upvotes: 1

Related Questions