Reputation: 157
Say I had a numpy array A
of shape (55, 50, 2)
, and I would like to do the following operation
B = np.dot(A[0, :, 0][:, None], A[0, :, 0][None, :])
i.e. compute the dot product of each row by its transpose (over i and k)
Without the use of np.einsum
and obviously any for loops, how can this operation be done by pure broadcasting and reshaping (if needed)?
Note:
Im tagging eigen3
here because essentially I would want to rewrite this operation with Eigen::Tensor
in terms of .broadcast()
and .reshape()
(Its easier for me to write it out in numpy
first to get the general picture) . So a direct Eigen
solution would be much appreciated, but since python, hence numpy, is more popular, I would accept a numpy solution as well.
Upvotes: 1
Views: 787
Reputation: 516
If I've understood correctly, then this operation can be done using the Eigen::Tensor module as follows
#include <unsupported/Eigen/CXX11/Tensor>
int main(){
// Define a tensor
Eigen::Tensor<double,3> A(55,50,2);
A.setRandom();
// Define the pairs of indices to multiply (i.e. index 1 on A twice)
std::array<Eigen::IndexPair<long>, 1> idx = { Eigen::IndexPair<long> {1, 1} };
// Contract. B will have dimensions [55,2,55,2]
Eigen::Tensor<double,4> B = A.contract(A, idx);
}
Parallel execution can be done with a small modification:
#include <unsupported/Eigen/CXX11/Tensor>
int main(){
// Define a tensor
Eigen::Tensor<double,3> A(55,50,2);
A.setRandom();
// Define the pairs of indices to multiply (i.e. index 1 on A twice)
std::array<Eigen::IndexPair<long>, 1> idx = { Eigen::IndexPair<long> {1, 1} };
// Define a parallel device
int num_threads = 4;
Eigen::ThreadPool tpl (num_threads);
Eigen::ThreadPoolDevice dev (&tpl, num_threads);
// Initialize result buffer B. B will have dimensions [55,2,55,2]
Eigen::Tensor<double,4> B(55,2,55,2);
// Contract using the parallel device
B.device(dev) = A.contract(A, idx);
}
See on compiler-explorer. Note the compiler flags, in particular -DEIGEN_USE_THREADS
. Also, I had to set -O0 because -O3 timed out.
On the topic of performance discussed in the comments: In my experience with it daily for the last ~4 years, the contractions using the Eigen::Tensor module perform very well, I haven't been able to find/produce anything faster (i.e. for dense, dynamically-sized tensors on shared memory). According to this answer (by the original author of Eigen), the contractions fall back to the matrix product kernels when possible, which should give some confidence. In this case you can use any BLAS library as backend.
Upvotes: 1
Reputation:
So what you need is an outer product for each row.
It is better to write loops and do a matrix multiplication if you are using C++ Eigen. You can dispatch to BLAS for performance.
If you want a way using Numpy, a messy solution is
B = A[...,newaxis].transpose(0,2,1,3)
C = [email protected](0,1,3,2) #Matrix multiplication
C = C.transpose(0,3,2,1)
np.array_equal(np.einsum('ijk,ilk->ijlk',A,A),C) ## check if both are identical
Upvotes: 3