Michael Choi
Michael Choi

Reputation: 630

Duplicate variadic template parameter

Context:

I am a Jr. Software Engineer, hopefully I am not reinventing the wheel, please let me know. I'd like to create a template function which wraps and calls another function element wise. For example:

// returns a*x + y
__device__ float saxpy(float a, float x, float y) {
  return a*x + y;
}


int main() {
  int A[4] = { 1,2,3,4 };
  int X[4] = { 1,2,3,4 };
  int Y[4] = { 1,1,1,1 };

  // A*X   = 1,4,9,16
  // A*X+Y = 2,5,10,17
  float *C = cudaReduce(saxpy, A, X, Y);

  for (int i = 0; i < 4; i++)
    printf("%d, ", C[i]); // should print "2, 5, 10, 17, "

  std::cin.ignore();
  return 0;
}

Importantly, I want to create this wrapper so that cuda calls are nicely wrapped when I perform element-wise operations. Though very incomplete, here is my pseudo-code attempt at the function wrapper.

I'd like to provide a minimal example; however, I have very little idea how to go about certain aspects of C++, so please forgive the large amounts of commented pseudocode:

#include "cuda_runtime.h"
#include "device_launch_parameters.h"

#include <iostream>

// returns a*x + y
__device__ float saxpy(float a, float x, float y) {
  return a*x + y;
}

// finds return type of function pointer
template<typename R, typename... A>
R ret(R(*)(A...));
template<typename C, typename R, typename... A>
R ret(R(C::*)(A...));

template<typename F, size_t N, typename... Args>
auto cudaReduce(F &f, Args(&...argsarray)[N]) {
  cudaSetDevice(0);

  // ret is function f's return type
  typedef decltype(ret(f)) ret;
  ret d_out[N], h_out[N];
  // cudaMalloc((void**)&d_out, sizeof(d_out));
  sendToCuda(argsarray...); // allocates and copies all contents of argsarray to cuda

  // reduceKernel<<<1, N>>>(f, d_out, dev_argsarray...);

  // cudaDeviceSynchronize();
  // cudaMemcpy(h_out, d_out, sizeof(h_out), cudaMemcpyDeviceToHost);
  // cudaFree(d_out);

  // for d_args in d_argsarray
  //   cudaFree(d_args);

  return h_out;
}

template<typename F, size_t N, typename Out, typename... Args>
__global__ void cudaReduceKernel(F &f, Out(&out)[N], Args(&...argsarray)[N]) {
  int tid = threadIdx.x;
  int i = tid + blockIdx.x * blockDim.x;

  // Below is invalid syntax; however, the 'pseudo-code' is what I'd like to achieve.
  // out[i] = f(argsarray[i]...);
}

// cuda malloc and memcpy
template<typename Arg, size_t N>
void sendToCuda(Arg(&args)[N]) {
  size_t buffer = sizeof(args);
  //cudaMalloc((void**)&dev_arg[ ??? ], buffer);
  //cudaMemcpy((void**)&dev_arg[ ??? ], args, buffer, cudaMemcpyHostToDevice);
}
template<typename Arg, size_t N, typename... Args>
void sendToCuda(Arg(&args)[N], Args(&...argsarray)[N]) {
  sendToCuda(args);
  sendToCuda(argsarray...);
}

int main() {
  int A[4] = { 1,2,3,4 };
  int X[4] = { 1,2,3,4 };
  int Y[4] = { 1,1,1,1 };

  // A*X   = 1,4,9,16
  // A*X+Y = 2,5,10,17
  float *C = cudaReduce(saxpy, A, X, Y);

  for (int i = 0; i < 4; i++)
    printf("%d, ", C[i]); // should print "2, 5, 10, 17, ", currently prints undefined behaviour

  std::cin.ignore();
  return 0;
}

I realize not everyone has time to completely review the code, so I will boil down the key problems into several points:

1. Is it possible to duplicate variadic template inputs, if so how? EX (not real code):

template<typename... Args>
void foo(Args... args) {
  Args... args2;
}

This is needed so that I can duplicate my input parameters to input parameters for my cuda malloc() and memcpy().

2. How would I go about the ith tuple of a variadic array parameter, like zipping in python. EX (not real code):

template<typename... Args, size_t N>
void bar(Args(&...argsarray)[N]) {
  // (python) ithvariadic = zip(*argsarray)[i]
  auto ithvariadic = argsarray[i]...;
}

Upvotes: 0

Views: 378

Answers (1)

max66
max66

Reputation: 66210

  1. Is it possible to duplicate variadic template inputs, if so how? EX (not real code):
template <typename... Args>
 void foo(Args... args) {
  Args2... args;
 }

Not that way.

The Args... typenames are deductibles from args... parameters.

But about Args2...? How can you deduce they? Do you want explicate they?

But are you sure that you need different types?

If you don't need a different list of types, the best I can imagine, as suggested by Jarod42, is the use of a tuple

Something as follows

template <typename ... Args>
void foo (Args ... args)  
 {
   std::tuple<Args...> tpl { args... };

   // do something with tpl`
 }

or, if you want enable perfect forwarding,

template <typename ... Args>
void foo (Args && ... args)  
 {
   std::tuple<Args...> tpl { std::forward<Args>(args)... };

   // do something with tpl`
 }
  1. How would I go about the ith tuple of a variadic array parameter, like zipping in python. EX (not real code):
template<typename... Args, size_t N>
void bar(Args(&...argsarray)[N]) {
  // (python) ithvariadic = zip(*argsarray)[i]
  auto ithvariadic = argsarray[i]...;
}

What about

template <typename ... Args, std::size_t N>
void bar (Args (&...argsarray)[N])
 {
   for ( auto ui = 0u ; ui < N ; ++ui )
    {
      std::tuple<Args...> ithvariadic { argsarray[ui]... };

      // do something with ithvariadic
    }
 }

?

Upvotes: 1

Related Questions