Basic CUDA - getting kernels to run on the device using C++

Question

I'm new to CUDA & trying to get a basic kernel to run on the device. I have compiled the examples & then run so I know the device drivers work/CUDA can run successfully. My goal is to get my C++ code to call CADU to greatly speed up a task. I've been reading over a bunch of different posts online about how to do this. Specifically, [here]: Can I call CUDA runtime function from C++ code not compiled by nvcc?.

My question is very simple (embracingly so) when I compile & run my code (posted below) I get no errrors but the kernel does not appear to run. This should be trivial to fix but after 6 hours I'm at a loss. I'd post this on the NVIDIA forums but they're still down :/. I'm sure the answer is very basic - any help? Below is: my code, how I compile it, & the terminal outputs I see:

main.cpp

#include 
#include 
#include 
extern void kernel_wrapper(int *a, int *b);

int main(int argc, char *argv[]){
int a = 2;
int b = 3;

printf("Input: a = %d, b = %d
",a,b);
kernel_wrapper(&a, &b);
printf("Ran: a = %d, b = %d
",a,b);
return 0;
}

kernel.cu

#include "cuPrintf.cu"
#include 
__global__ void kernel(int *a, int *b){
int tx = threadIdx.x;
cuPrintf("tx = %d
", tx);
switch( tx ){
  case 0:
    *a = *a + 10;
    break;
  case 1:
    *b = *b + 3;
    break;
  default:
    break;
  }
}

void kernel_wrapper(int *a, int *b){
  cudaPrintfInit();
  //cuPrintf("Anything...?");
  printf("Anything...?
");
  int *d_1, *d_2;
  dim3 threads( 2, 1 );
  dim3 blocks( 1, 1 );

  cudaMalloc( (void **)&d_1, sizeof(int) );
  cudaMalloc( (void **)&d_2, sizeof(int) );

  cudaMemcpy( d_1, a, sizeof(int), cudaMemcpyHostToDevice );
  cudaMemcpy( d_2, b, sizeof(int), cudaMemcpyHostToDevice );

  kernel<<< blocks, threads >>>( a, b );
  cudaMemcpy( a, d_1, sizeof(int), cudaMemcpyDeviceToHost );
  cudaMemcpy( b, d_2, sizeof(int), cudaMemcpyDeviceToHost );
  printf("Output: a = %d
", a[0]);
  cudaFree(d_1);
  cudaFree(d_2);

  cudaPrintfDisplay(stdout, true);
  cudaPrintfEnd();
}

I compile the above code from the terminal using the commands:

g++ -c main.cpp
nvcc -c kernel.cu -I/home/clj/NVIDIA_GPU_Computing_SDK/C/src/simplePrintf
nvcc -o main main.o kernel.o

When I run the code I get the following terminal output:

$./main
Input: a = 2, b = 3
Anything...?
Output: a = 2
Ran: a = 2, b = 3

It's clear that the main.cpp is being compiled correctly & calling the kernel.cu code. The obvious problem is that the kernel does not appear to run. I'm sure the answer to this is basic - VERY VERY BASIC. But I don't know what's happening - help please?

Basic CUDA - getting kernels to run on the device using C++

Answers (1)

Related Questions