lefe
lefe

Reputation: 157

Numpy 3D Array/Eigen Tensor C++, multiply each row by its transpose

Say I had a numpy array A of shape (55, 50, 2), and I would like to do the following operation

B = np.dot(A[0, :, 0][:, None], A[0, :, 0][None, :])

i.e. compute the dot product of each row by its transpose (over i and k)

Without the use of np.einsum and obviously any for loops, how can this operation be done by pure broadcasting and reshaping (if needed)?

Note:

Im tagging eigen3 here because essentially I would want to rewrite this operation with Eigen::Tensor in terms of .broadcast() and .reshape() (Its easier for me to write it out in numpy first to get the general picture) . So a direct Eigen solution would be much appreciated, but since python, hence numpy, is more popular, I would accept a numpy solution as well.

Upvotes: 1

Views: 787

Answers (2)

DavidAce
DavidAce

Reputation: 516

If I've understood correctly, then this operation can be done using the Eigen::Tensor module as follows

#include <unsupported/Eigen/CXX11/Tensor>
int main(){
    // Define a tensor
    Eigen::Tensor<double,3> A(55,50,2);
    A.setRandom();

    // Define the pairs of indices to multiply (i.e. index 1 on A twice)
    std::array<Eigen::IndexPair<long>, 1> idx = { Eigen::IndexPair<long> {1, 1} };
    
    // Contract. B will have dimensions [55,2,55,2]
    Eigen::Tensor<double,4> B = A.contract(A, idx);
}

See on compiler-explorer.

Parallel execution can be done with a small modification:

#include <unsupported/Eigen/CXX11/Tensor>
int main(){
    // Define a tensor
    Eigen::Tensor<double,3> A(55,50,2);
    A.setRandom();

    // Define the pairs of indices to multiply (i.e. index 1 on A twice)
    std::array<Eigen::IndexPair<long>, 1> idx = { Eigen::IndexPair<long> {1, 1} };
    
    // Define a parallel device 
    int num_threads = 4;
    Eigen::ThreadPool       tpl (num_threads);
    Eigen::ThreadPoolDevice dev (&tpl, num_threads);

    // Initialize result buffer B. B will have dimensions [55,2,55,2]
    Eigen::Tensor<double,4> B(55,2,55,2);

    // Contract using the parallel device
    B.device(dev) = A.contract(A, idx);
}

See on compiler-explorer. Note the compiler flags, in particular -DEIGEN_USE_THREADS. Also, I had to set -O0 because -O3 timed out.

On the topic of performance discussed in the comments: In my experience with it daily for the last ~4 years, the contractions using the Eigen::Tensor module perform very well, I haven't been able to find/produce anything faster (i.e. for dense, dynamically-sized tensors on shared memory). According to this answer (by the original author of Eigen), the contractions fall back to the matrix product kernels when possible, which should give some confidence. In this case you can use any BLAS library as backend.

Upvotes: 1

user10289025
user10289025

Reputation:

So what you need is an outer product for each row.

It is better to write loops and do a matrix multiplication if you are using C++ Eigen. You can dispatch to BLAS for performance.

If you want a way using Numpy, a messy solution is

B = A[...,newaxis].transpose(0,2,1,3)
C = [email protected](0,1,3,2) #Matrix multiplication
C = C.transpose(0,3,2,1)
np.array_equal(np.einsum('ijk,ilk->ijlk',A,A),C) ## check if both are identical

Upvotes: 3

Related Questions