Returning references to CUDA-specific vector types

Question

I would like to implement f1 with the argument and return value exactly as in the code below. It fails with the error:

a reference of type "float1 &" (not const-qualified) cannot be initialized with a value of type "float"

However, almost the same function f2 with a native C++ type float instead of CUDA-specific wrapper float1 works nicely.

https://godbolt.org/z/1j1e1r98d

__device__ float1& f1(float4& v) {
    return v.x; // ERROR
}

__device__ float& f2(float4& v) {
    return v.x; // OK
}

How I can change the implementation of f1 to fix this error?

Bas Groothedde · Accepted Answer

This solution was discussed in the comments and there I also stated that I find this a bit dirty, however if CUDA specification guarantees the alignment of the float4 and float1 values then this could be a valid option;

__device__ float1& f1(float4& v) {
    return *reinterpret_cast(&v);
}

__device__ float& f2(float4& v) {
    return v.x;
}

In this solution, you reinterpret the address of v as a pointer to float1. You can then dereference the result to have v as float1&.

Be careful with reinterpret_cast and different struct's when it comes to alignment and offsets.

In the Compiler Explorer Example you can see both functions produce the exact same output.

Returning references to CUDA-specific vector types

Answers (1)

Related Questions