Numpy 3D Array/Eigen Tensor C++, multiply each row by its transpose

Question

Say I had a numpy array A of shape (55, 50, 2), and I would like to do the following operation

B = np.dot(A[0, :, 0][:, None], A[0, :, 0][None, :])

i.e. compute the dot product of each row by its transpose (over i and k)

Without the use of np.einsum and obviously any for loops, how can this operation be done by pure broadcasting and reshaping (if needed)?

Note:

Im tagging eigen3 here because essentially I would want to rewrite this operation with Eigen::Tensor in terms of .broadcast() and .reshape() (Its easier for me to write it out in numpy first to get the general picture) . So a direct Eigen solution would be much appreciated, but since python, hence numpy, is more popular, I would accept a numpy solution as well.

DavidAce · Accepted Answer

If I've understood correctly, then this operation can be done using the Eigen::Tensor module as follows

#include 
int main(){
    // Define a tensor
    Eigen::Tensor A(55,50,2);
    A.setRandom();

    // Define the pairs of indices to multiply (i.e. index 1 on A twice)
    std::array, 1> idx = { Eigen::IndexPair {1, 1} };
    
    // Contract. B will have dimensions [55,2,55,2]
    Eigen::Tensor B = A.contract(A, idx);
}

See on compiler-explorer.

Parallel execution can be done with a small modification:

#include 
int main(){
    // Define a tensor
    Eigen::Tensor A(55,50,2);
    A.setRandom();

    // Define the pairs of indices to multiply (i.e. index 1 on A twice)
    std::array, 1> idx = { Eigen::IndexPair {1, 1} };
    
    // Define a parallel device 
    int num_threads = 4;
    Eigen::ThreadPool       tpl (num_threads);
    Eigen::ThreadPoolDevice dev (&tpl, num_threads);

    // Initialize result buffer B. B will have dimensions [55,2,55,2]
    Eigen::Tensor B(55,2,55,2);

    // Contract using the parallel device
    B.device(dev) = A.contract(A, idx);
}

See on compiler-explorer. Note the compiler flags, in particular -DEIGEN_USE_THREADS. Also, I had to set -O0 because -O3 timed out.

On the topic of performance discussed in the comments: In my experience with it daily for the last ~4 years, the contractions using the Eigen::Tensor module perform very well, I haven't been able to find/produce anything faster (i.e. for dense, dynamically-sized tensors on shared memory). According to this answer (by the original author of Eigen), the contractions fall back to the matrix product kernels when possible, which should give some confidence. In this case you can use any BLAS library as backend.

Numpy 3D Array/Eigen Tensor C++, multiply each row by its transpose

Answers (2)

Related Questions