CUDA: How to return a device lambda from a host function

Question

I have a virtual function which returns a different lambda depending on the derived class:

class Base
{
public:
    virtual std::function foo(void) = 0;
};

class Derived : public Base
{
public:
    std::function foo(void) {
        return [] __device__ (void) {
            return 1.0f;
        };
    }
};

Then I want to pass this lambda to a CUDA kernel and call it from the device. In other words, I want to do this:

template
__global__ void kernel(Func f) {
    f();
}

int main(int argc, char** argv)
{
    Base* obj = new Derived;
    kernel<<<1, 1>>>(obj->foo());
    cudaDeviceSynchronize();
    return 0;
}

Tha above give an error like this: calling a __host__ function("std::function ::operator ()") from a __global__ function("kernel< ::std::function > ") is not allowed

As you can see, I declare my lambda as __device__, but the foo() method stores it in a std::function in order to return it. As a result, what is passed to the kernel() is a host address and of course it does not work. I guess that is my problem, right? So my questions are:

Is it somehow possible to create a __device__ std::function and return that from the foo() method?
If this is not possible, is there any other way to dynamically select a lambda and pass it to the CUDA kernel? Hard-coding multiple calls to kernel() with all the possible lambdas is not an option.

So far, from the quick research I did, CUDA does not have/support the necessary syntax required to make a function return a device lambda. I just hope I am wrong. :) Any ideas?

Thanks in advance

einpoklum · Accepted Answer

Before actually answering, I have to wonder whether your question isn't an XY problem. That is, I am by default skeptical that people have a good excuse for executing code through lambdas/function pointers on the device.

But I won't evade your question like that...

Is it somehow possible to create a __device__ std::function and return that from the foo() method?

Short answer: No, try something else.

Longer answer: If you want to implement a large chunk of the standard library on the device side, then maybe you could have a device-side std::function-like class. But I'm not sure that's even possible (quite possibly not), and anyway - it's beyond the capabilities of everyone except very seasoned library developers. So, do something else.

If this is not possible, is there any other way to dynamically select a lambda and pass it to the CUDA kernel? Hard-coding multiple calls to kernel() with all the possible lambdas is not an option.

First, remember that lambdas are essentially anonymous classes - and thus, if they don't capture anything, they're reducible to function pointers since the anonymous classes have no data, just an operator().

So if the lambdas have the same signature and no capture, you can cast them into a (non-member-)function pointer and pass those to the function; and this definitely works, see this simple example on nVIDIA's forums.

Another possibility is using run-time mapping from type id's or other such keys into instances of these types, or rather, to constructors. That is, using a factory. But I don't want to get into the details of that to not make this answer longer than it already is; and it's probably not a good idea.

CUDA: How to return a device lambda from a host function

Answers (2)

Related Questions