user513164
user513164

Reputation: 1868

cuda and c++ problem

hi i have a cuda program which run successfully here is code for cuda program

#include <stdio.h>
#include <cuda.h>

    __global__ void square_array(float *a, int N)
    {
      int idx = blockIdx.x * blockDim.x + threadIdx.x;
      if (idx<N) 
       a[idx] = a[idx] * a[idx];
    }

    int main(void)
    {
      float *a_h, *a_d; 
      const int N = 10;  
      size_t size = N * sizeof(float);
      a_h = (float *)malloc(size);        
      cudaMalloc((void **) &a_d, size);   
      for (int i=0; i<N; i++) a_h[i] = (float)i;
      cudaMemcpy(a_d, a_h, size, cudaMemcpyHostToDevice);
      int block_size = 4;
      int n_blocks = N/block_size + (N%block_size == 0 ? 0:1);
      square_array <<< n_blocks, block_size >>> (a_d, N);

      cudaMemcpy(a_h, a_d, sizeof(float)*N, cudaMemcpyDeviceToHost);
      // Print results
      for (int i=0; i<N; i++) printf("%d %f\n", i, a_h[i]);

      free(a_h); 
      cudaFree(a_d);
    }

now i want to split this code into two files means there should be two file one for c++ code or c code and other one .cu file for kernel. i just wanat to do it for learning and i don't want to write same kernel code again and again. can any one tell me how to do this ? how to split this code into two different file? than how to compile it? how to write makefile for it ? how to

Upvotes: 0

Views: 1359

Answers (2)

CygnusX1
CygnusX1

Reputation: 21818

The biggest obstacle that you will most likely encounter is to - how to call your kernel from your cpp file. C++ will not understand your <<< >>> syntax. There are 3 ways of doing it.

  • Just write a small encapsulating host function in your .cu file

  • Use CUDA library functions (cudaConfigureCall, cudaFuncGetAttributes, cudaLaunch) --- check Cuda Reference Manual for details, chapter "Execution Control" online version. You can use those functions in plain C++ code, as long as you include the cuda libraries.

  • Include PTX at runtime. It is harder, but allows you to manipulate PTX code at runtime. This JIT approach is explained in Cuda Programming Guide (chapter 3.3.2) and in Cuda Reference Manual (Module Management chapter) online version


Encapsilating function could look like this for example:

mystuff.cu:

... //your device square_array function

void host_square_array(dim3 grid, dim3 block, float *deviceA, int N) {
  square_array <<< grid, block >>> (deviceA, N);
}

mystuff.h

#include <cuda.h>
void host_square_array(dim3 grid, dim3 block, float *deviceA, int N);

mymain.cpp

#include "mystuff.h"

int main() { ... //your normal host code
}

Upvotes: 0

kokosing
kokosing

Reputation: 5601

Code which has CUDA C extensions has to be in *.cu file, rest can be in c++ file.

So here your kernel code can be moved to separate *.cu file.

To have main function implementation in c++ file you need to wrap invocation of kernel (code with square_array<<<...>>>(...);) with c++ function which implementation needs to be in *cu file as well.

Functions cudaMalloc etc. can be left in c++ file as long as you include proper cuda headers.

Upvotes: 1

Related Questions