Reputation: 107
I am trying to get a series of Residual Sum of Squarts(RSS) from C++ program using for loop. And I used RcppEigen.package.skeleton()
to seamless combine C++ and R. While, When I run data X with 788rows*857cols and Y with 788rows*1cols, the run-time of C++ program is user(4.62s) system(3.87s) elapsed(8.51s),and run-time of R program is user(8.68s) system(1.78s) elapsed(10.53s). C++ program is less faster than R. The platform I used is win7(X64) with 8G RAM. How could I speed up my program? Any help will be appreciated.
Here is C++ program:
#include <RcppEigen.h>
//*---get Residual Sum of Squarts via Matrix Operation
//fastLm()
double getRSS(const Eigen::MatrixXd& X, const Eigen::MatrixXd& Y){
Eigen::MatrixXd RSS=((Y-X*((X.transpose()*X).inverse()*X.transpose()*Y)).transpose())*(Y-X*((X.transpose()*X).inverse()*X.transpose()*Y));
double RSSd = RSS.determinant();
return RSSd;
}
//*---get F value from RSS and df
double getFval(double RSS1,double RSS2, int n1,int n2,int nObs){
return (RSS1-RSS2)/(n1-n2)/(RSS2/(nObs-n2-1));
}
//*---remove p columns from i-th collumn of matrix
Eigen::MatrixXd removeColumn(const Eigen::MatrixXd& matrix, unsigned int i,int p){
unsigned int numRows = matrix.rows();
unsigned int numCols = matrix.cols()-p;
Eigen::MatrixXd X;
X=matrix;
if( i < numCols )
X.block(0,i,numRows,numCols-i) = matrix.block(0,i+p,numRows,numCols-i);
X.conservativeResize(numRows,numCols);
return X;
}
// [[Rcpp::export]]
Rcpp::List getPIcvalue(bool findIn,int p,int n, const Eigen::VectorXd& varIn, const Eigen::MatrixXd& Y,const Eigen::MatrixXd& Xf,const Eigen::MatrixXd& X0){
// varIn=(0,1,0,1...,0); p=1 :addition or elimination column; findIn=false,add 1 column of Xf to X0, findIn=false,eliminate 1 column to X0. n=X0.rows();
bool valid;
valid=true;
double FitStat1;
FitStat1 = 1e+10;
int pointer;
pointer=-2;
double FitStat;
int nR = n-X0.cols(); // n is the X0.rows()
int nF; //nF=nR-1 //findIn=false
double RSSr;
double RSSf;
double F_value;
RSSr = getRSS(X0,Y);
int k;
if(false==findIn){
k = p;
}else{
k = -p;
}
Eigen::MatrixXd X(n,X0.cols()+k);
if(false==findIn){
for(int i=0;i<Xf.cols();i++){
if(0==varIn[i]){
X<<X0,Xf.col(i); // X: combine X0 and ith column of Xf
nF = n-X.cols();
RSSf = getRSS(X,Y);
FitStat = getFval(RSSr,RSSf,X.cols(),X0.cols(),n);
//FitStat = getPvalue(F_value,nF,nR);
if(FitStat<FitStat1){
FitStat1=FitStat;
pointer=i;
}
}//varIn
}//for i
}else{
for(int i=1;i<X0.cols();i++){
X = removeColumn(X0,i,p);
RSSf = getRSS(X,Y);
FitStat = getFval(RSSf,RSSr,X0.cols(),X.cols(),n);
//FitStat = getPvalue(F_value,nR,nF);
if(FitStat<FitStat1){
FitStat1=FitStat;
pointer=i;
}
}//for i
}//findIn
return Rcpp::List::create(Rcpp::Named("keyV")=FitStat1,
Rcpp::Named("keyP")=pointer+1,
Rcpp::Named("keyR")=valid);
}
Upvotes: 0
Views: 1579
Reputation: 18562
Your expression for the RSS matrix formula is extremely inefficient. You do this:
Eigen::MatrixXd RSS = (
(Y - X *
( ( X.transpose() * X ).inverse() * X.transpose() * Y )
).transpose() ) *
( Y - X *
( ( X.transpose() * X ).inverse() * X.transpose() * Y )
);
which is clearly very repetitive and re-computes the same expensive operations many times. Transposing a matrix should be very cheap, unless it ends up requiring a copy. But inverting a matrix (even a symmetric positive definite matrix, as is the case here, which Eigen can't know unless you tell it) is very expensive. Heck.. even matrix multiplications are expensive.
You might be thinking that Eigen does some under-the-hood magic to elide the redundant operations and finds the most efficient sequence of operations to do to get the results. But Eigen remains fairly conservative on this front (relying on conservative expression templates resolved at compile-time, when it really should be using run-time expression optimizations). So, it really won't do that much here. You need to help it remove the redundant operations by doing that work yourself.
And finally, you can combine an inversion and a multiplication by doing a linear system solution instead (instead of A = inv(X) * B
, you do solve(X * A = B)
), which also allows you to specify the most appropriate decomposition (here, it's either llt or ldlt, depending on how well-conditioned you expect your matrix (Xt*X)
to be).
You get this:
auto Xt = X.transpose(); //<- deduce the type with 'auto' to avoid copy-evaluation of the transpose.
const Eigen::MatrixXd A = X * ( Xt * X ).ldlt().solve(Xt);
const Eigen::MatrixXd Y_AY = Y - A * Y;
Eigen::MatrixXd RSS = Y_AY.transpose() * Y_AY;
But actually, you can further optimize this by realizing that the X * (Xt * X)^-1 * Xt * Y
is actually equivalent to X * B
where B
is the least-square solution to X*B = Y
. If you use the QR method (don't use SVD here, it's total overkill and very slow, I don't understand why it's even mentioned in the Eigen docs as a viable method to a linear least-square (probably because the Eigen people are amateurs!)), you can do this:
const Eigen::MatrixXd B = X.colPivHouseholderQr().solve( Y );
const Eigen::MatrixXd Y_XB = Y - X * B;
Eigen::MatrixXd RSS = Y_XB.transpose() * Y_XB;
which should be way faster than what you had before (at least, in terms of time-complexity, this should be orders of magnitude faster).
Also, if Y
happens to be a square matrix, then you should compute the determinant of Y_XB
and square it, instead of computing the determinant of its product with its own transpose. This will remove one matrix multiplication (and copy into RSS
).
Finally, I haven't looked too much into your other functions (that call getRSS) but you should do whatever you can to avoid recomputing (at every iteration) things that don't change, or don't change too much, like the QR decomposition of X. There are ways that you can maintain the QR decomposition throughout the changes to X, but this is more than I can elaborate on here, and probably not something you can do with Eigen.
Upvotes: 10