Aditya Kashi
Aditya Kashi

Reputation: 366

Dealing with a contiguous vector of fixed-size matrices for both storage layouts in Eigen

An external library gives me a raw pointer of doubles that I want to map to an Eigen type. The raw array is logically a big ordered collection of small dense fixed-size matrices, all of the same size. The main issue is that the small dense matrices may be in row-major or column-major ordering and I want to accommodate them both.

My current approach is as follows. Note that all the entries of a small fixed-size block (in the array of blocks) need to be contiguous in memory.

template<int bs, class Mattype>
void block_operation(double *const vals, const int numblocks)
{
    Eigen::Map<Mattype> mappedvals(vals, 
                Mattype::IsRowMajor ? numblocks*bs : bs,
                Mattype::IsRowMajor ? bs : numblocks*bs
            );

    for(int i = 0; i < numblocks; i++)
      if(Mattype::isRowMajor)
          mappedvals.template block<bs,bs>(i*bs,0) = block_operation_rowmajor(mappedvals);
      else
          mappedvals.template block<bs,bs>(0,i*bs) = block_operation_colmajor(mappedvals);
}

The calling function first figures out the Mattype (out of 2 options) and then calls the above function with the correct template parameter.

Thus all my algorithms need to be written twice and my code is interspersed with these layout checks. Is there a way to do this in a layout-agnostic way? Keep in mind that this code needs to be as fast as possible.

Ideally, I would Map the data just once and use it for all the operations needed. However, the only solution I could come up with was invoking the Map constructor once for every small block, whenever I need to access the block.

template<int bs, StorageOptions layout>
inline Map<Matrix<double,bs,bs,layout>> extractBlock(double *const vals, 
                                                     const int bindex)
{
  return Map<Matrix<double,bs,bs,layout>>(vals+bindex*bs*bs);
}

Would this function be optimized away to nothing (by a modern compiler like GCC 7.3 or Intel 2017 under -std=c++14 -O3), or would I be paying a small penalty every time I invoke this function (once for each block, and there are a LOT of small blocks)? Is there a better way to do this?

Upvotes: 0

Views: 242

Answers (1)

ggael
ggael

Reputation: 29205

Your extractBlock is fine, a simpler but somewhat uglier solution is to use a reinterpret cast at the start of block_operation:

using BlockType = Matrix<double,bs,bs,layout|DontAlign>;
BlockType* blocks = reinterpret_cast<BlockType*>(vals);

for(int i...)
  block[i] = ...;

This will work for fixed sizes matrices only. Also note the DontAlign which is important unless you can guaranty that vals is aligned on a 16 or even 32 bytes depending on the presence of AVX and bs.... so just use DontAlign!

Upvotes: 1

Related Questions