Mark Schultz-Wu
Mark Schultz-Wu

Reputation: 374

Cyclically rotating a GPU vector?

I have an algorithm I would like to implement, which involves

  1. coordinatewise addition,
  2. coordinatewise multiplication, and
  3. cyclic rotation of coordinates.

My addition and multiplication are a little annoying (they are arithmetic modulo n), but I imagine I can port over standard algorithms, e.g. using Montgomery arithmetic, so am not focusing on that topic now.

For this last step, I mean that mapping a vector [a, b, c, d] -> [b, c, d, a] -> [c, d, a, b] -> [d, a, b, c] -> [a,b,c,d], though in general my vectors will be ~1000 dimensions, so the general cyclic rotation will be somewhat more complex.

What is the best way to accomplish this third step that I require? Looking into the Thrust library, it seems that the simplest thing to do is to use a permutation_iterator. This should suffice to provide a correct implementation, but given that the permutation I want to compute is exceedingly simple, I was wondering if this is the fastest option.

Also, while this question is tagged thrust, if there is some obviously better library/framework to use, I would appreciate pointers.

Upvotes: 1

Views: 134

Answers (1)

Homer512
Homer512

Reputation: 13295

If you use a transform iterator to build the permutation on-the-fly, it should be very efficient; certainly faster than actually moving the data.

This should work:

#include <thrust/device_vector.h>
#include <thrust/iterator/counting_iterator.h>
#include <thrust/iterator/permutation_iterator.h>
#include <thrust/iterator/transform_iterator.h>

struct rotate_right : public thrust::unary_function<int,int>
{
  int rotate_by, end;

  rotate_right(int rotate_by, int end) noexcept
  : rotate_by(rotate_by), end(end)
  { assert(rotate_by >= 0 && rotate_by <= end); }

  __host__ __device__
  int operator()(int idx) const noexcept
  {
      idx += rotate_by;
      if(idx >= end)
          idx -= end;
      return idx;
  }
};

template<class Iterator>
auto make_rotate_right(Iterator unrotated, int rotate_by, int len)
{
  using counting_iterator = thrust::counting_iterator<int>;
  counting_iterator base_index(0);
  using index_iterator = thrust::transform_iterator<rotate_right, counting_iterator>;
  index_iterator permutation{base_index, rotate_right{rotate_by, len}};
  return thrust::make_permutation_iterator(unrotated, permutation);
}

thrust::device_vector<float> foo()
{
  int n = 4;
  thrust::device_vector<float> v(n);
  v[0] = 1.0f;
  v[1] = 4.0f;
  v[2] = 9.0f;
  v[3] = 16.0f;
  auto rotated = make_rotate_right(v.begin(), 2, n);
  return {rotated, rotated + n};
}

int main()
{
  auto f = foo();
  for(auto fi: f)
    std::cout << fi << '\n';
}

Prints

9
16
1
4

Upvotes: 3

Related Questions