0xbadf00d
0xbadf00d

Reputation: 18178

Attempt to use an extended __device__ lambda in a context that requires querying its return type in host code

I'm receiving the compiler error

static_assert failed: 'Attempt to use an extended __device__ lambda in a context that requires querying its return type in host code. Use a named function object, a __host__ __device__ lambda, or cuda::proclaim_return_type instead.'

when compiling this code:

thrust::device_vector<float2> a;
thrust::device_vector<float> b;

float param1, param2;
float2 param3;

thrust::transform_reduce(
    thrust::make_zip_iterator(thrust::make_tuple(a.begin(), b.begin())),
    thrust::make_zip_iterator(thrust::make_tuple(a.end(), b.end())),
    [param1, param2, param3] __device__ (thrust::tuple<float2, float> const& tuple)
    {
        /* do something and return a float2 */
    },
    float2{},
    [] __device__ (float2 const& first, float2 const& second)
    {
        float2 result{};
        result.x = first.x + second.x;
        result.y = first.y + second.y;
        return result;
    });

How can I rewrite this code so that it compiles? I'm new to CUDA/Thrust and its quite obscure to me how I need to deal with this problem.

(BTW, do I really need the complicaed "transformation" function here? thrust::plus<float2> doesn't work.)

Upvotes: 2

Views: 482

Answers (2)

alfC
alfC

Reputation: 16242

To complement Robert Crovella's answer this is what worked and didn't work for me with cuda 12.0 and cuda 12.5 for a typical thrust function over a GPU container:

auto res2 = thrust::transform_reduce(
    x.begin(), x.end(),

// vvv--- works (?) in cuda 12, doesn't work with cuda 12.5
//  [] __device__ (T const& e) {return std::abs(e.real()) + std::abs(e.imag());},

// vvv--- works (?) in cuda 12.0, doesn't work with cuda 12.5
//  [] __device__ (T const& e) -> double {return std::abs(e.real()) + std::abs(e.imag());},

// vvv--- doesn't work in cuda 12.0, works with cuda 12.5
//  cuda::proclaim_return_type<double>([] __device__ (T const& e) {return std::abs(e.real()) + std::abs(e.imag());}),

// vvv--- doesn't work
//  [] (T const& e) constexpr {return std::abs(e.real()) + std::abs(e.imag());},

// vvv--- works, but needs function to be host also
    [] __host__ __device__ (T const& e) {return std::abs(e.real()) + std::abs(e.imag());},

    double{}, thrust::plus<>{}
);

(by "doesn't work" I mean "doesn't compile", by "works" I mean "compiles and gives the right result, apparently")

Upvotes: 0

Robert Crovella
Robert Crovella

Reputation: 151869

How can I rewrite this code so that it compiles?

A possible solution seems to be indicated in the assert message you posted in your question:

static_assert failed: 'Attempt to use an extended __device__ lambda in a context that requires querying its return type in host code. Use a named function object, a __host__ __device__ lambda, or cuda::proclaim_return_type instead.'

When I change each lambda to be decorated with __host__ __device__ instead of just __device__ then the code compiles for me on CUDA 12.1

(BTW, do I really need the complicaed "transformation" function here? thrust::plus doesn't work.)

CUDA doesn't provide arithmetic operators for the vector types supplied by CUDA, and AFAIK thrust doesn't either:

then x+y must be defined

So you'll need to provide your own definition to add float2 types, i.e.

__host__ __device__ __forceinline__
float2 operator+(float2 left, float2 right) noexcept {
    return float2{left.x + right.x,
                  left.y + right.y};
}

will allow you to use thrust::plus<float2>.

Upvotes: 2

Related Questions