thchittenden
thchittenden

Reputation: 85

C++ CUDA Pointer-to-member

I'm wondering if there's anyway to pass a pointer-to-member to a device function in CUDA. Since the pointer is really just relative to the struct/class it doesn't seem like there should be any reason it wouldn't work but I can't seem to get the code to compile.

#include <stdio.h>


struct S {
    int F1;
    int F2;
    int F3;
};

__device__ S x;

__global__ void initialize_S() {
    x.F1 = 100;
    x.F2 = 200;
    x.F3 = 300;
}

__global__ void print_S(int S::* m) {
    printf("val: %d\n", x.*m);
}

int main() {

    initialize_S<<<1, 1>>>();
    print_S<<<1, 1>>>(&S::F1);

    cudaDeviceSynchronize();
}

When compiling I get the following error with NVCC v5.5

/tmp/tmpxft_000068a5_00000000-16_ptm.o: In function `main':
tmpxft_000068a5_00000000-3_ptm.cudafe1.cpp:(.text+0xcf): undefined reference to `print_S(int S::*)'
/tmp/tmpxft_000068a5_00000000-16_ptm.o: In function `__device_stub__Z7print_SM1Si(long)':
tmpxft_000068a5_00000000-3_ptm.cudafe1.cpp:(.text+0x17f): undefined reference to `print_S(int S::*)'
tmpxft_000068a5_00000000-3_ptm.cudafe1.cpp:(.text+0x184): undefined reference to `print_S(int S::*)'
collect2: error: ld returned 1 exit status

Any help would be appreciated. Thanks!

EDIT: after traipsing through the code genrerated by NVCC it actually looks like it's generating it wrong:

extern void __device_stub__Z7print_SM1Si(long);
void __device_stub__Z7print_SM1Si( long __par0) { if (cudaSetupArgument((void *)(char *)&__par0, sizeof(__par0), (size_t)0UL) !=
cudaSuccess) return; { volatile static char *__f __attribute__((unused)); __f = ((char *)((void ( *)(long))print_S)); (void)cudaL
aunch(((char *)((void ( *)(long))print_S))); }; }
# 18 "ptm.cu"
void print_S( long __cuda_0)
# 18 "ptm.cu"
{__device_stub__Z7print_SM1Si( __cuda_0);

}

By patching the generated code to convert these "long"s to "int S::*"s it compiles and functions correctly.

 extern void __device_stub__Z7print_SM1Si(int S::*);
 void __device_stub__Z7print_SM1Si(int S::* __par0) { if (cudaSetupArgument((void *)(char *)&__par0, sizeof(__par0), (size_t)0UL)
 != cudaSuccess) return; { volatile static char *__f __attribute__((unused)); __f = ((char *)((void ( *)(int S::*))print_S)); (voi
 d)cudaLaunch(((char *)((void ( *)(int S::*))print_S))); }; }
 # 18 "ptm.cu"
 void print_S(int S::* __cuda_0)
 # 18 "ptm.cu"
 {__device_stub__Z7print_SM1Si( __cuda_0);

 }

Upvotes: 4

Views: 425

Answers (2)

Robert Crovella
Robert Crovella

Reputation: 152173

This appears to be a limitation of nvcc as already indicated elsewhere. I have filed a bug with the compiler team. They are aware of the issue. I don't have any further information about a possible update or schedule.

A possible workaround was suggested as follows, for Linux/MacOS only:

#include <stdio.h>

template <typename T>
struct dummy {
  T inner;
  T __host__ __device__ get(void) { return inner; };
  __host__ __device__ dummy(T in) : inner(in) { };
};



struct S {
    int F1;
    int F2;
    int F3;
};

__device__ S x;

__global__ void initialize_S() {
    x.F1 = 100;
    x.F2 = 200;
    x.F3 = 300;
}

__global__ void print_S(dummy<int S::*> m) {
    printf("val: %d\n", x.*(m.get()));
}

int main() {

    initialize_S<<<1, 1>>>();
    print_S<<<1, 1>>>(dummy<int S::*>(&S::F1));

    cudaDeviceSynchronize();
}

I'm not able to comment on the usefulness of the above. The above seems to compile and run correctly on CUDA 6.0

Also, usage of pointer-to-member appears to work correctly in device code. The limitation described here is specific to its usage when passed as a __global__ function parameter.

Upvotes: 3

thchittenden
thchittenden

Reputation: 85

It looks like this is a limitation of NVCC at the moment. I've posted in the NVIDIA dev forums so hopefully this gets resolved!

Upvotes: 2

Related Questions