Reputation: 71
Suppose I have two Eigen Arrays defined as
Array<float, 10, 1> a;
Array<float, 4, 1> b;
Now, I would like to get an Array of Array<bool, 10, 4> result;
reading true
for every one of the 10 entries of a
where it is larger than every of the four entries of b
. One way to achieve that I thought of was:
Array<float, 10, 1> a;
Array<float, 4, 1> b;
// ... fill the arrays
Array<bool, 10, 4> result = a.replicate(1 , 4) > (b.transpose()).replicate(10, 1);
In reality, the matrix dimensions are much larger though. This approach works, but I have two questions.
First: if I replace Array<bool, 10, 4> result = ...
by auto result = ...
,using the result
is much slower for some reason (I use it after to do some calculations based on the outcome of the bool).
Second: While this works, I wonder if this is the most efficient approach as I don't know whether replicate requires copying. I could consider iterating over one of the two dimensions
Array<float, 10, 1> a;
Array<float, 4, 1> b;
// ... fill the arrays
Array<bool, 10, 4> result;
for(int i = 0; i < 4; i++){
result.col(i) = a > b(i, 0);
}
which would remove the necessity of using replicate
at the cost of adding an iteration.
Any help is greatly appreciated!
EDIT:
Here's the chunk of code of interest. It gets called very often and so speedup is of great concern. Right now, this eats up 90% of the overall execution time. It is part of line intersection detection for thousands of lines. My approach was to write all the checks in matrix expressions to avoid iterating over all line pairs. To do that, I build a matrix with n
rows for all n
lines and m
columns for all lines the other n
lines could collide with. Then, the line intersection formula can be applied coefficientwise over the big matrices which I hope brings speedup.
Eigen::Array<float, Eigen::Dynamic, Eigen::Dynamic> x1 = rays.col(0).replicate(1, 4*obs.size());
Eigen::Array<float, Eigen::Dynamic, Eigen::Dynamic> x2 = (rays.col(2) + rays.col(0)).replicate(1, 4*obs.size());
Eigen::Array<float, Eigen::Dynamic, Eigen::Dynamic> y1 = rays.col(1).replicate(1, 4*obs.size());
Eigen::Array<float, Eigen::Dynamic, Eigen::Dynamic> y2 = (rays.col(3) + rays.col(1)).replicate(1, 4*obs.size());
Eigen::Array<float, Eigen::Dynamic, Eigen::Dynamic> x3 = obstacles.col(0).transpose().replicate(num_rays*num_ships, 1);
Eigen::Array<float, Eigen::Dynamic, Eigen::Dynamic> x4 = obstacles.col(2).transpose().replicate(num_rays*num_ships, 1);
Eigen::Array<float, Eigen::Dynamic, Eigen::Dynamic> y3 = obstacles.col(1).transpose().replicate(num_rays*num_ships, 1);
Eigen::Array<float, Eigen::Dynamic, Eigen::Dynamic> y4 = obstacles.col(3).transpose().replicate(num_rays*num_ships, 1);
Eigen::Array<float, Eigen::Dynamic, Eigen::Dynamic> t_den = (x1-x2)*(y3-y4) -(y1-y2)*(x3-x4);
Eigen::Array<float, Eigen::Dynamic, Eigen::Dynamic> t = (x1-x3)*(y3-y4) -(y1-y3)*(x3-x4)/t_den;
Eigen::Array<float, Eigen::Dynamic, Eigen::Dynamic> u = ((x1-x3)*(y1-y2) - (y1-y3)*(x1-x2));
Eigen::Array<bool, Eigen::Dynamic, Eigen::Dynamic> col_r = 0 <= t && 0 <= u && u <= t_den;
t_rays = col_r.select(t, 1000).rowwise().minCoeff();
Eigen::Array<float, Eigen::Dynamic, Eigen::Dynamic> x1_ = ship_b_boxs.col(0).replicate(1, 4*obs.size());
Eigen::Array<float, Eigen::Dynamic, Eigen::Dynamic> x2_ = (ship_b_boxs.col(2)).replicate(1, 4*obs.size());
Eigen::Array<float, Eigen::Dynamic, Eigen::Dynamic> y1_ = ship_b_boxs.col(1).replicate(1, 4*obs.size());
Eigen::Array<float, Eigen::Dynamic, Eigen::Dynamic> y2_ = (ship_b_boxs.col(3)).replicate(1, 4*obs.size());
Eigen::Array<float, Eigen::Dynamic, Eigen::Dynamic> x3_ = x3(Eigen::seq(0, 4*num_ships-1), Eigen::all);
Eigen::Array<float, Eigen::Dynamic, Eigen::Dynamic> x4_ = x4(Eigen::seq(0, 4*num_ships-1), Eigen::all);
Eigen::Array<float, Eigen::Dynamic, Eigen::Dynamic> y3_ = y3(Eigen::seq(0, 4*num_ships-1), Eigen::all);
Eigen::Array<float, Eigen::Dynamic, Eigen::Dynamic> y4_ = y4(Eigen::seq(0, 4*num_ships-1), Eigen::all);
Eigen::Array<float, Eigen::Dynamic, Eigen::Dynamic> t_den_ = (x1_-x2_)*(y3_-y4_) -(y1_-y2_)*(x3_-x4_);
Eigen::Array<float, Eigen::Dynamic, Eigen::Dynamic> t_ = (x1_-x3_)*(y3_-y4_) -(y1_-y3_)*(x3_-x4_)/t_den_;
Eigen::Array<float, Eigen::Dynamic, Eigen::Dynamic> u_ = ((x1_-x3_)*(y1_-y2_) - (y1_-y3_)*(x1_-x2_));
Eigen::Array<bool, Eigen::Dynamic, Eigen::Dynamic> col_s = (0 <= t_ && t_ <= 1 && 0 <= u_ && u_ <= t_den_).rowwise().maxCoeff();
with 85% taken by inlined'Eigen::Array::Array'
. I can see that probably now all the temporary array constructions take a lot of time. However, storing these temporary variables is beneficial as they are used more than once. Is there any way to speed this up?
Upvotes: 1
Views: 419
Reputation: 702
First: if I replace Array<bool, 10, 4> result = ... by auto result = ...,using the result is much slower for some reason (I use it after to do some calculations based on the outcome of the bool).
As mentioned by @Homer512 in a comment, do not use auto in Eigen expressions
Second: While this works, I wonder if this is the most efficient approach as I don't know whether replicate requires copying
No, if you use replicate
within an expression and you do not actively store it in some intermediate variable or otherwise force the expression to be evaluated, there is no copying involved. As in typical Eigen style replicate
only returns an expression of the replication, see the doc, an actual object containing a full replica is only created if absolutely necessary.
In short, this involves copies:
Array<bool, 10, 4> a4r = a.replicate(1 , 4);
Array<bool, 10, 4> b4r = b.transpose().replicate(10, 1);
Array<bool, 10, 4> result = a4r > b4r;
while your expression does not:
Array<bool, 10, 4> result = a.replicate(1 , 4) > b.transpose().replicate(10, 1);
So I really think it's the most efficient way to do it, or close to it.
How is this magic possible? It is well explained in the doc.
Side note:
a.replicate<1,4>() > b.transpose().replicate<10,1>();
If you still do not believe there is no copying involved, you can check the source code, as of version 3.4.0
Eigen/src/Core/Replicate.h
template<typename MatrixType,int RowFactor,int ColFactor> class Replicate
: public internal::dense_xpr_base< Replicate<MatrixType,RowFactor,ColFactor> >::type
{
[...]
template<typename OriginalMatrixType>
EIGEN_DEVICE_FUNC
inline Replicate(const OriginalMatrixType& matrix, Index rowFactor, Index colFactor)
: m_matrix(matrix), m_rowFactor(rowFactor), m_colFactor(colFactor)
{
EIGEN_STATIC_ASSERT((internal::is_same<typename internal::remove_const<MatrixType>::type,OriginalMatrixType>::value),
THE_MATRIX_OR_EXPRESSION_THAT_YOU_PASSED_DOES_NOT_HAVE_THE_EXPECTED_TYPE)
}
EIGEN_DEVICE_FUNC EIGEN_CONSTEXPR
inline Index rows() const { return m_matrix.rows() * m_rowFactor.value(); }
EIGEN_DEVICE_FUNC EIGEN_CONSTEXPR
inline Index cols() const { return m_matrix.cols() * m_colFactor.value(); }
EIGEN_DEVICE_FUNC
const _MatrixTypeNested& nestedExpression() const
{
return m_matrix;
}
protected:
MatrixTypeNested m_matrix;
const internal::variable_if_dynamic<Index, RowFactor> m_rowFactor;
const internal::variable_if_dynamic<Index, ColFactor> m_colFactor;
};
Eigen/src/Core/DenseBase.h
const Replicate<Derived, Dynamic, Dynamic> replicate(Index rowFactor, Index colFactor) const
{
return Replicate<Derived, Dynamic, Dynamic>(derived(), rowFactor, colFactor);
}
Eigen/src/Core/CoreEvaluators.h
template<typename ArgType, int RowFactor, int ColFactor>
struct unary_evaluator<Replicate<ArgType, RowFactor, ColFactor> >
: evaluator_base<Replicate<ArgType, RowFactor, ColFactor> >
{
[...]
EIGEN_DEVICE_FUNC EIGEN_STRONG_INLINE
explicit unary_evaluator(const XprType& replicate)
: m_arg(replicate.nestedExpression()),
m_argImpl(m_arg),
m_rows(replicate.nestedExpression().rows()),
m_cols(replicate.nestedExpression().cols())
{}
EIGEN_DEVICE_FUNC EIGEN_STRONG_INLINE
CoeffReturnType coeff(Index row, Index col) const
{
// try to avoid using modulo; this is a pure optimization strategy
const Index actual_row = internal::traits<XprType>::RowsAtCompileTime==1 ? 0
: RowFactor==1 ? row
: row % m_rows.value();
const Index actual_col = internal::traits<XprType>::ColsAtCompileTime==1 ? 0
: ColFactor==1 ? col
: col % m_cols.value();
return m_argImpl.coeff(actual_row, actual_col);
}
[...]
protected:
const ArgTypeNested m_arg;
evaluator<ArgTypeNestedCleaned> m_argImpl;
const variable_if_dynamic<Index, ArgType::RowsAtCompileTime> m_rows;
const variable_if_dynamic<Index, ArgType::ColsAtCompileTime> m_cols;
};
I know that the code is heavy and difficult to interpret if you are not familiar with this kind of mechanisms, that's why I've removed all but the most relevant part. What you need to retain is this: replicate
only creates an expression of type Replicate
which simply stores a reference to the original object, the number of row copies, and the number of column copies; then, when you want to retrieve a coefficient from the expression, the evaluator computes the appropriate index in the original matrix using modulo and returns the corresponding element.
Upvotes: 3