Reputation: 83
Can anybody explain the following behavior of Eigen sparse matrices? I have been looking into aliasing and lazy evaluation, but I don't seem to be able to improve on the issue. Tech specs: I am using the latest Eigen stable release on Ubuntu 16.10 with g++ compiler and no optimization flags.
Suppose I define a simple identity in the following way:
SparseMatrix<double> spIdent(N,N);
spIdent.reserve(N);
spIdent.setIdentity();
then perform these operations with it
spIdent-spIdent;
spIdent*spIdent;
spIdent - spIdent*spIdent;
and measure the computation times for all three. What I get is this
0 Computation time: 2.6e-05
1 Computation time: 2e-06
2 Computation time: 1.10706
Meaning that either operation is fast, but the combination is super slow. The noalias()
method is only defined for dense matrices, plus in my dense example it did not make much of a difference. Any enlightenment?
MCVE:
#include <iostream>
#include <ctime>
#include "../Eigen/Sparse"
using namespace std;
using namespace Eigen;
int main() {
unsigned int N=2000000;
SparseMatrix<double> spIdent(N,N);
spIdent.reserve(N);
spIdent.setIdentity();
clock_t start=clock();
spIdent*spIdent;
cout << "0 Computation time: " << float(clock() - start)/1e6 << '\n';
start=clock();
spIdent-spIdent;
cout << "1 Computation time: " << float(clock() - start)/1e6 << '\n';
start=clock();
spIdent - (spIdent*spIdent);
cout << "2 Computation time: " << float(clock() - start)/1e6 << '\n';
return 0;
}
Upvotes: 3
Views: 1569
Reputation: 10596
It's not so much that it gets optimized out as much as the lazy evaluation is, well, very lazy. Look at the product. The code that's called is (at least in whatever version of Eigen that was included on this machine):
template<typename Derived>
template<typename OtherDerived>
inline const typename SparseSparseProductReturnType<Derived,OtherDerived>::Type
SparseMatrixBase<Derived>::operator*(const SparseMatrixBase<OtherDerived> &other) const
{
return typename SparseSparseProductReturnType<Derived,OtherDerived>::Type(derived(), other.derived());
}
which returns an expression of the product (i.e. lazy). Nothing is done with this expression, so the cost is zero. The same goes for the difference. Now, when doing a-a*a
the a*a
is an expression. It then meets the operator-
. This sees the expression on the right hand side. This expression is then evaluated to a temporary (i.e. costs time) in order to use it in operator-
. Why evaluate to a temporary? Read this for their logic (search for "The second circumstance").
operator-
is a CwiseBinaryOp
with the product expression as the right hand side. The first thing CwiseBinaryOp
does is assign the right hand side to a member:
EIGEN_STRONG_INLINE CwiseBinaryOp(const Lhs& aLhs, const Rhs& aRhs, const BinaryOp& func = BinaryOp())
: m_lhs(aLhs), m_rhs(aRhs), m_functor(func)
(m_rhs(aRhs)
) which in turn calls a SparseMatrix
constructor:
/** Constructs a sparse matrix from the sparse expression \a other */
template<typename OtherDerived>
inline SparseMatrix(const SparseMatrixBase<OtherDerived>& other)
: m_outerSize(0), m_innerSize(0), m_outerIndex(0), m_innerNonZeros(0)
{
...
*this = other.derived();
}
which in turn calls operator=
which will (somebody correct me if I'm wrong) always trigger an evaluation, in this case, to a temporary.
Upvotes: 4
Reputation: 1304
I think as @hfhc2 has mentioned, the first two statements in the code get optimized away entirely by the compiler (since the results are not needed in the rest). In the third statement, an auxiliary intermediate variable is most likely produced to store the temporary result of spIdent*spIdent
.
To see this clearly, consider the following example which includes explicit copy-assignments:
#include <iostream>
#include <ctime>
#include <Eigen/Sparse>
using namespace std;
using namespace Eigen;
int main () {
const unsigned int N = 2000000;
SparseMatrix<double> spIdent(N,N);
SparseMatrix<double> a(N,N), b(N,N), c(N,N);
spIdent.reserve(N);
spIdent.setIdentity();
clock_t start = clock();
a = spIdent*spIdent;
cout << "0 Computation time: " << float(clock() - start)/1e6 << endl;
start = clock();
b = spIdent-spIdent;
cout << "1 Computation time: " << float(clock() - start)/1e6 << endl;
start = clock();
c = a - b;
cout << "2 Computation time: " << float(clock() - start)/1e6 << endl;
return 0;
}
The measured times (without compiler optimizations) are [for openSUSE 12.2 (x86_64), g++ 4.7.1, Intel 2core 2GHz CPU]:
0 Computation time: 1.58737
1 Computation time: 0.417798
2 Computation time: 0.428174
which seems quite reasonable.
Upvotes: 3
Reputation: 4391
Well, as people mentioned, in the first two statements the code gets optimized away entirely (I have tested with a current version of g++ and -O3
set). The disassembly shows this for the second statement:
400e78: e8 03 fe ff ff callq 400c80 <clock@plt> # timing begins
400e7d: 48 89 c5 mov %rax,%rbp
400e80: e8 fb fd ff ff callq 400c80 <clock@plt> # timing ends
Fop the third part there is actually something happening an Eigen library code is called:
400ede: e8 9d fd ff ff callq 400c80 <clock@plt> # timing begins
400ee3: 48 89 c5 mov %rax,%rbp
400ee6: 8b 44 24 58 mov 0x58(%rsp),%eax
400eea: 39 44 24 54 cmp %eax,0x54(%rsp)
400eee: c6 44 24 20 00 movb $0x0,0x20(%rsp)
400ef3: 48 89 5c 24 28 mov %rbx,0x28(%rsp)
400ef8: 48 89 5c 24 30 mov %rbx,0x30(%rsp)
400efd: 48 c7 44 24 38 00 00 movq $0x0,0x38(%rsp)
400f04: 00 00
400f06: c6 44 24 40 01 movb $0x1,0x40(%rsp)
400f0b: 0f 85 99 00 00 00 jne 400faa <main+0x22a>
400f11: 48 8d 4c 24 1f lea 0x1f(%rsp),%rcx
400f16: 48 8d 54 24 20 lea 0x20(%rsp),%rdx
400f1b: 48 8d bc 24 90 00 00 lea 0x90(%rsp),%rdi
400f22: 00
400f23: 48 89 de mov %rbx,%rsi
400f26: e8 25 1a 00 00 callq 402950 <_ZN5Eigen13CwiseBinaryOpINS_8internal20scalar_difference_opIdEEKNS_12SparseMatrixIdLi0EiEEKNS_19SparseSparseProductIRS6_S8_EEEC1ES8_RSA_RKS3_>
400f2b: 48 8d bc 24 a0 00 00 lea 0xa0(%rsp),%rdi
400f32: 00
400f33: e8 18 02 00 00 callq 401150 <_ZN5Eigen12SparseMatrixIdLi0EiED1Ev>
400f38: e8 43 fd ff ff callq 400c80 <clock@plt> # timing ends
I guess that in this case the compiler is not able to figure out that the result of the computation is not used, as opposed to the first two cases.
If you take a look at the documentation, then you can see that a simple operation like +
on a sparse matrix returns not a matrix but a CwiseUnaryOp
representing the result. I figure that if you don't use this class somewhere, then the resulting matrix is never constructed.
Upvotes: 4