Reputation: 3291
I have a struct containing the parameters of a linear function, as well as the function itself. What I want to do is copy this struct to the device and then evaluate the linear function. The following example doesn't make sense but it is sufficient to describe the difficulties I have:
struct model
{
double* params;
double (*func)(double*, double);
};
I don't know how to copy this struct to the device.
Here are my functions:
// init function for struct model
__host__ void model_init(model* m, double* params, double(*func)(double*,double))
{
if(m)
{
m->params = params;
m->func = func;
}
}
__device__ double model_evaluate(model* m, double x)
{
if(m)
{
return m->func(m->params, x);
}
return 0.0;
}
__host__ __device__ double linear_function(double* params, double x)
{
return params[0] + params[1] * x;
}
__device__ double compute(model *d_linear_model)
{
return model_evaluate(d_linear_model,1.0);
}
__global__ void kernel(double *array, model *d_linear_model, int N)
{
int idx = blockIdx.x * blockDim.x + threadIdx.x;
if (idx < N)
{
array[idx] = compute(d_linear_model);
}
}
I know how to copy an array from host to device but I don't know how to do this for this concrete struct which contains a function.
The kernel call in main then looks like this:
int block_size = 4;
int n_blocks = N_array/block_size + (N_array % block_size == 0 ? 0:1);
kernel<<<n_blocks, block_size>>>(device_array, d_linear_model, N_array);
Upvotes: 0
Views: 598
Reputation: 151879
You've outlined two items that I consider to be somewhat more difficult than beginner-level CUDA programming:
params
pointer in your model
structure)Both of these topics have been covered in other questions. For example this question/answer discusses deep copy operations - when a data structure has embedded pointers to other data. And this question/answer links to a variety of resources on device function pointer usage.
But I'll go ahead and offer a possible solution for your posted case. Most of your code is usable as-is (at least for demonstration purposes). As mentioned already, your model
structure will present two challenges:
struct model
{
double* params; // requires a "deep copy" operation
double (*func)(double*, double); // requires special handling for device function pointers
};
As a result, although most of your code is usable as-is, your "init" function is not. That might work for a host realization, but not for a device realization.
The deep copy operation requires us to copy the overall structure, plus separately copy the data pointed to by the embedded pointer, plus separately copy or "fixup" the embedded pointer itself.
The usage of a device function pointer is restricted by the fact that we cannot grab the actual device function pointer in host code - that is illegal in CUDA. So one possible solution is to use a __device__
construct to "capture" the device function pointer, then do a cudaMemcpyFromSymbol
operation in host code, to retrieve the numerical value of the device function pointer, which can then be moved about in ordinary fashion.
Here's a worked example building on what you have shown, demonstrating the two concepts above. I have not created a "device init" function - but all the code necessary to do that is in the main
function. Once you've grasped the concepts, you can take whatever code you wish out of the main function below and craft it into your "device init" function, if you wish to create one.
Here's a worked example:
$ cat t968.cu
#include <iostream>
#define NUM_PARAMS 2
#define ARR_SIZE 1
#define nTPB 256
struct model
{
double* params;
double (*func)(double*, double);
};
// init function for struct model -- not using this for device operations
__host__ void model_init(model* m, double* params, double(*func)(double*,double))
{
if(m)
{
m->params = params;
m->func = func;
}
}
__device__ double model_evaluate(model* m, double x)
{
if(m)
{
return m->func(m->params, x);
}
return 0.0;
}
__host__ __device__ double linear_function(double* params, double x)
{
return params[0] + params[1] * x;
}
__device__ double compute(model *d_linear_model)
{
return model_evaluate(d_linear_model,1.0);
}
__global__ void kernel(double *array, model *d_linear_model, int N)
{
int idx = blockIdx.x * blockDim.x + threadIdx.x;
if (idx < N)
{
array[idx] = compute(d_linear_model);
}
}
__device__ double (*linear_function_ptr)(double*, double) = linear_function;
int main(){
// grab function pointer from device code
double (*my_fp)(double*, double);
cudaMemcpyFromSymbol(&my_fp, linear_function_ptr, sizeof(void *));
// setup model
model my_model;
my_model.params = new double[NUM_PARAMS];
my_model.params[0] = 1.0;
my_model.params[1] = 2.0;
my_model.func = my_fp;
// setup for device copy of model
model *d_model;
cudaMalloc(&d_model, sizeof(model));
// setup "deep copy" for params
double *d_params;
cudaMalloc(&d_params, NUM_PARAMS*sizeof(double));
cudaMemcpy(d_params, my_model.params, NUM_PARAMS*sizeof(double), cudaMemcpyHostToDevice);
// copy model to device
cudaMemcpy(d_model, &my_model, sizeof(model), cudaMemcpyHostToDevice);
// fixup device params pointer in device model
cudaMemcpy(&(d_model->params), &d_params, sizeof(double *), cudaMemcpyHostToDevice);
// run test
double *d_array, *h_array;
cudaMalloc(&d_array, ARR_SIZE*sizeof(double));
h_array = new double[ARR_SIZE];
for (int i = 0; i < ARR_SIZE; i++) h_array[i] = i;
cudaMemcpy(d_array, h_array, ARR_SIZE*sizeof(double), cudaMemcpyHostToDevice);
kernel<<<(ARR_SIZE+nTPB-1)/nTPB,nTPB>>>(d_array, d_model, ARR_SIZE);
cudaMemcpy(h_array, d_array, ARR_SIZE*sizeof(double), cudaMemcpyDeviceToHost);
std::cout << "Results: " << std::endl;
for (int i = 0; i < ARR_SIZE; i++) std::cout << h_array[i] << " ";
std::cout << std::endl;
return 0;
}
$ nvcc -o t968 t968.cu
$ cuda-memcheck ./t968
========= CUDA-MEMCHECK
Results:
3
========= ERROR SUMMARY: 0 errors
$
For brevity of presentation, I've dispensed with proper cuda error checking (instead I have run the code with cuda-memcheck
to demonstrate that it is without runtime error) but I would recommend proper error checking if you're having any trouble with a code.
Upvotes: 4