Reputation: 557
I would like to implement f1
with the argument and return value exactly as in the code below.
It fails with the error:
a reference of type "float1 &" (not const-qualified) cannot be initialized with a value of type "float"
However, almost the same function f2
with a native C++ type float
instead of CUDA-specific wrapper float1
works nicely.
https://godbolt.org/z/1j1e1r98d
__device__ float1& f1(float4& v) {
return v.x; // ERROR
}
__device__ float& f2(float4& v) {
return v.x; // OK
}
How I can change the implementation of f1
to fix this error?
Upvotes: 0
Views: 158
Reputation: 402
This solution was discussed in the comments and there I also stated that I find this a bit dirty, however if CUDA specification guarantees the alignment of the float4
and float1
values then this could be a valid option;
__device__ float1& f1(float4& v) {
return *reinterpret_cast<float1*>(&v);
}
__device__ float& f2(float4& v) {
return v.x;
}
In this solution, you reinterpret the address of v
as a pointer to float1
. You can then dereference the result to have v
as float1&
.
Be careful with reinterpret_cast
and different struct
's when it comes to alignment and offsets.
In the Compiler Explorer Example you can see both functions produce the exact same output.
Upvotes: 1