Marko Grdinić
Marko Grdinić

Reputation: 4062

Does Cuda C++ not have tuples in device code?

__global__ void addKernel(int *c, const int *a, const int *b)
{
    int i = threadIdx.x;
    auto lamb = [](int x) {return x + 1; }; // Works.
    auto t = std::make_tuple(1, 2, 3); // Does not work.
    c[i] = a[i] + b[i];
}

NVCC has lambdas at least, but std::make_tuple fails to compile. Are tuples not allowed in the current version of Cuda?

Upvotes: 5

Views: 3538

Answers (4)

einpoklum
einpoklum

Reputation: 131525

Indeed, CUDA itself does not offer a device-side-capable version of std::tuple. However, I have a full tuple implementation as part of my cuda-kat library (still very much under initial development at the time of writing). thrust's tuple class is limited in the following senses:

  1. Limited to 10 tuple elements.
  2. Recursively expands templated types for every tuple element.
  3. No/partial support for rvalues (e.g. in get())

The tuple implementation in cuda-kat is an adaptation of the EASTL tuple, which in turn is an adaptation of the LLVM project's libc++ tuple. Unlike the EASTL's, however, it is C++11-compatible, so you don't have to have the absolute latest CUDA version. It is possible to extract only the tuple class from the library with oh, I think 4 files or so, if you need just that.

Upvotes: 2

ArtemB
ArtemB

Reputation: 3622

Support for the standard c++ library on device side is problematic for CUDA as the standard library does not have the necessary __host__ or __device__ annotations.

That said, both clang and nvcc do have partial support for some functionality. Usually it's limited to constexpr functions that are considered to be __host__ __device__ if you pass --expt-relaxed-constexpr to nvcc (or by default in clang). Clang also has a bit more support for standard math functions. Neither supports anything that relies on C++ runtime (except for memory allocation, printf and assert) as that does not exist on device side.

So, in short -- most of the standard C++ library is unusable on device side in CUDA, though things do slowly improve as more and more functions in the standard library become constexpr.

Upvotes: 2

dada_dave
dada_dave

Reputation: 493

I've just tried this out and tuple metaprogramming with std:: (std::tuple, std::get, etc ...) will work in device code with C++14 and expt-relaxed-constexpr enabled (CUDA8+) during compilation (e.g. nvcc -std=c++14 xxxx.cu -o yyyyy --expt-relaxed-constexpr) - CUDA 9 required for C++14, but basic std::tuple should work in CUDA 8 if you are limited to that. Thrust/tuple works but has some drawbacks: limited to 10 items and lacking in some of the std::tuple helper functions (e.g. std::tuple_cat). Because tuples and their related functions are compile-time, expt-relaxed-constexpr should enable your std::tuple to "just work".

#include <tuple>

__global__ void kernel()
{
    auto t = std::make_tuple(1, 2, 3);
    printf("%d\n",std::get<0>(t));
}

int main()
{
   kernel<<<1,1>>>();
   cudaDeviceSynchronize();
}

Upvotes: 4

Marko Grdinić
Marko Grdinić

Reputation: 4062

#include <thrust/tuple.h>

__global__ void addKernel(int *c, const int *a, const int *b)
{
    int i = threadIdx.x;
    auto lamb = [](int x) {return x + 1; }; // Works.
    auto t = thrust::make_tuple(1, 2, 3);
    c[i] = a[i] + b[i];
}

I needed to get the ones from the Thrust library instead to make them work it seems. The above does compile.

Upvotes: 2

Related Questions