Reputation: 1

Assignment of function pointer with the unified memory in CUDA

I am trying to implement the dynamic binding of functions with CUDA under the convenient unified memory model. Here, we have a struct Parameters containing a member, a function pointer void (*p_func)().

#include <cstdio>

struct Parameters {
    void (*p_func)();
};

The struct is managed by the unified memory and we assign the actual function func_A to p_func.

__host__ __device__
void func_A() {
    printf("func_A is correctly invoked!\n");
    return;
}

When we go through the following code, the problem arises: if assignment 1 runs, i.e., para->p_func = func_A, both device and host function addresses are actually assigned by the function address at the host. In the contrast, if assignment 2 runs, the addresses both become the device one.

__global__ void assign_func_pointer(Parameters* para) {
    para->p_func = func_A;
}

__global__ void run_on_device(Parameters* para) {
    printf("run on device with address %p\n", para->p_func);
    para->p_func();
}

void run_on_host(Parameters* para) {
    printf("run on host with address %p\n", para->p_func);
    para->p_func();
}

int main(int argc, char* argv[]) {

    Parameters* para;
    cudaMallocManaged(&para, sizeof(Parameters));

    // assignment 1, if we uncomment this section, p_func points to address at host
    para->p_func = func_A;
    printf("addr@host: %p\n", para->p_func);

    // assignment 2, if we uncomment this section, p_func points to address at device
    assign_func_pointer<<<1,1>>>(para); // 
    cudaDeviceSynchronize();
    printf("addr@device: %p\n", para->p_func);

    run_on_device<<<1,1>>>(para);
    cudaDeviceSynchronize();

    run_on_host(para);

    cudaFree(para);
    return 0;
}

The question now is, is it possible for the function pointers at both the device and host point to the correct function addresses, respectively, under the unified memory model?

Upvotes: 0

Answers (2)

Robert Crovella

Reputation: 151879

With some modifications to the struct definition, something like this may be possible:

$ cat t1288.cu
#include <cstdio>

struct Parameters {
    void (*p_hfunc)();
    void (*p_dfunc)();
    __host__ __device__
    void p_func(){
      #ifdef __CUDA_ARCH__
      (*p_dfunc)();
      #else
      (*p_hfunc)();
      #endif
      }
};

__host__ __device__
void func_A() {
    printf("func_A is correctly invoked!\n");
    return;
}

__global__ void assign_func_pointer(Parameters* para) {
    para->p_dfunc = func_A;
}

__global__ void run_on_device(Parameters* para) {
    printf("run on device\n"); // with address %p\n", para->p_dfunc);
    para->p_func();
}

void run_on_host(Parameters* para) {
    printf("run on host\n"); // with address %p\n", para->p_func);
    para->p_func();
}

int main(int argc, char* argv[]) {

    Parameters* para;
    cudaMallocManaged(&para, sizeof(Parameters));

    // assignment 1, if we uncomment this section, p_func points to address at host
    para->p_hfunc = func_A;
    printf("addr@host: %p\n", para->p_hfunc);

    // assignment 2, if we uncomment this section, p_func points to address at device
    assign_func_pointer<<<1,1>>>(para); //
    cudaDeviceSynchronize();
    printf("addr@device: %p\n", para->p_dfunc);

    run_on_device<<<1,1>>>(para);
    cudaDeviceSynchronize();
    run_on_host(para);

    cudaFree(para);
    return 0;
}
$ nvcc -arch=sm_35 -o t1288 t1288.cu
$ cuda-memcheck ./t1288
========= CUDA-MEMCHECK
addr@host: 0x402add
addr@device: 0x8
run on device
func_A is correctly invoked!
run on host
func_A is correctly invoked!
========= ERROR SUMMARY: 0 errors
$

I concur with the other answer that it is currently not possible even with managed memory, to have a single numerical function pointer that works correctly both in host code and device code.

Upvotes: 1

talonmies

Reputation: 72349

Leaving aside the technicalities of unified memory for a moment, your question is effectively "can one variable simultaneously have two different values?" and the answer to that is obviously no.

In more detail: CUDA unified memory fundamentally ensures that a given managed allocation will have consistent values (under certain constraints) when accessed from both host and device. What you are asking for is the complete opposite of that, and it obviously isn't supported.

Upvotes: 1

Assignment of function pointer with the unified memory in CUDA

Answers (2)

Related Questions