Permuting the rows of a matrix (and corresponding vector) using CUDA/Thrust

Question

I would like to permute the rows of a matrix that is stored as an interleaved array (i.e. backed by a vector in row-major C-style format) and apply the same permutation to the elements of a corresponding vector.

Say the matrix dimensions are RxC and the corresponding vector has R elements.

My current idea is to generate a permutation of R indices and then use thrust::stable_sort_by_key to permute the vector as shown here.

I can then create another permutation vector that repeats each element of the one I previously created C times.

So if R = 4, C = 3 and the original permutation index vector was [4, 2, 3, 1] the permutation vector for the matrix would be [4, 4, 4, 2, 2, 2, 3, 3, 3, 1, 1, 1]. By using stable sort, the elements in a row of the matrix should not get permuted.

My question is then if there is a better/more efficient way to do this, using Thrust or plain CUDA.

Example:

Original matrix:

[ 1 1 1 1 ]
[ 2 2 2 2 ]
[ 3 3 3 3 ]
[ 4 4 4 4 ]
[ 5 5 5 5 ]

Original vector:

[1 2 3 4 5]

Permutation order:

[5 3 1 2 4]

Permuted matrix:

[ 5 5 5 5 ]
[ 3 3 3 3 ]
[ 1 1 1 1 ]
[ 2 2 2 2 ]
[ 4 4 4 4 ]

Permuted vector:

[5 3 1 2 4]

My use case is that I have a matrix of features and a vector of corresponding labels for each example. I would like to permute the matrix and apply the same permutation on the vector, as the shuffle step before an iteration of SGD. The reason I want to have contiguous rows and iterate through them is that I plan to use cuBLAS gemv to perform the matrix-vector operations, which assumes that the matrix is laid out in a similar way in memory (albeit in column-major format which means I need to call it like this)

Robert Crovella · Accepted Answer

My question is then if there is a better/more efficient way to do this, using Thrust

I believe there is. The permutation vector gives you all the information you need to directly copy the contents of the input matrix to the permuted matrix, without the need for sorting.

A useful thrust feature for this is the permutation_iterator. The permutation iterator allows us to re-order on-the-fly our selection of input elements to be used in any operation. If we provide an appropriate index calculation functor, we can pass a linear index (via a counting_iterator) to the index functor, to create (via a transform_iterator) the appropriate permuted input index for any element in the copy operation.

Here is a worked example:

$ cat t1061.cu
#include 
#include 
#include 
#include 
#include 
#include 
#include 

typedef int mytype;

struct copy_idx_func : public thrust::unary_function
{
  size_t c;
  unsigned *p;
  copy_idx_func(const size_t _c, unsigned *_p) : c(_c),p(_p) {};
  __host__ __device__
  unsigned operator()(unsigned idx){
    unsigned myrow = idx/c;
    unsigned newrow = p[myrow]-1;
    unsigned mycol = idx%c;
    return newrow*c+mycol;
  }
};


int main(){

  const mytype mat[]   = {1,1,1,1,1,2,2,2,2,2,3,3,3,3,3,4,4,4,4,4,5,5,5,5,5};
  const mytype vec[]   = {1,2,3,4,5};
  const unsigned per[] = {5,3,1,2,4};

  const size_t msize = sizeof(mat)/sizeof(mytype);
  const size_t vsize = sizeof(vec)/sizeof(mytype);
  const size_t psize = sizeof(per)/sizeof(unsigned);
  const size_t cols  = msize/vsize;
  // const size_t rows  = vsize;
  assert(msize%vsize == 0);
  assert(vsize == psize);

  thrust::device_vector   d_m(mat, mat+msize);
  thrust::device_vector   d_v(vec, vec+vsize);
  thrust::device_vector d_p(per, per+psize);
  thrust::device_vector   d_rm(msize);
  thrust::device_vector   d_rv(vsize);
  std::cout << "Initial Matrix:" << std::endl;
  thrust::copy_n(d_m.begin(), msize, std::ostream_iterator(std::cout, ","));

  // permute the matrix
  thrust::copy_n(thrust::make_permutation_iterator(d_m.begin(), thrust::make_transform_iterator(thrust::counting_iterator(0), copy_idx_func(cols,thrust::raw_pointer_cast(d_p.data())))), msize, d_rm.begin());

  std::cout << std::endl << "Permuted Matrix:" << std::endl;
  thrust::copy_n(d_rm.begin(), msize, std::ostream_iterator(std::cout, ","));
  std::cout << std::endl << "Initial Vector:" << std::endl;
  thrust::copy_n(d_v.begin(), vsize, std::ostream_iterator(std::cout, ","));

  // permute the vector
  thrust::copy_n(thrust::make_permutation_iterator(d_v.begin(), thrust::make_transform_iterator(thrust::counting_iterator(0),  copy_idx_func(1,thrust::raw_pointer_cast(d_p.data())))), vsize, d_rv.begin());

  std::cout << std::endl << "Permuted Vector:" << std::endl;
  thrust::copy_n(d_rv.begin(), vsize, std::ostream_iterator(std::cout, ","));
  std::cout << std::endl;
}

$ nvcc -o t1061 t1061.cu
$ ./t1061
Initial Matrix:
1,1,1,1,1,2,2,2,2,2,3,3,3,3,3,4,4,4,4,4,5,5,5,5,5,
Permuted Matrix:
5,5,5,5,5,3,3,3,3,3,1,1,1,1,1,2,2,2,2,2,4,4,4,4,4,
Initial Vector:
1,2,3,4,5,
Permuted Vector:
5,3,1,2,4,
$

Notes:

Permuting the vector operationally is identical to permuting the matrix. We simply treat the vector as a matrix of one column.
As discussed in the comments, if the use-case were entirely within thrust, there might be no need to copy elements at all. The permutation_iterator allows us to select elements from the original matrix in any permuted order, and we can simply pass this construct to any thrust operation that needed the original matrix in a permuted order.

Permuting the rows of a matrix (and corresponding vector) using CUDA/Thrust

Answers (1)

Related Questions