mat
mat

Reputation: 2617

fast method to calculate p-values from .lm.fit()

I am running simulations and fitting linear models with .lm.fit(). Although extremely fast, this function does not provide the predictors' p-values. Is there a fast way to compute them (maybe from the values returned by .lm.fit())? I am aware of this method to calculate approximate p-values, but I need the exact ones.

Update:
Dirk Eddelbuettel provided the fastest way to fit the lm and Ben Bolker a method to compute the p-values, by combining both answers we get:

set.seed(101)
X <- cbind(1,matrix(1:10))
y <- rnorm(10)

mdl <- RcppArmadillo::fastLmPure(X, y)

pval <- 2*pt(abs(mdl$coefficients/mdl$stderr), mdl$df.residual, lower.tail=FALSE)

Upvotes: 5

Views: 1735

Answers (2)

Ben Bolker
Ben Bolker

Reputation: 226192

Dirk's answer will be faster, but in case it's convenient here's the implementation in pure R (extracting the bits you need from summary.lm, and assuming there are no issues with non-full-rank model matrices etc.)

Example:

set.seed(101)
X <- cbind(1,matrix(1:10))
y <- rnorm(10)
m <- .lm.fit(X,y)

p-value calcs:

rss <- sum(m$residuals^2)
rdf <- length(y) - ncol(X)
resvar <- rss/rdf
R <- chol2inv(m$qr)
se <- sqrt(diag(R) * resvar)
2*pt(abs(m$coef/se),rdf,lower.tail=FALSE)

Compare with:

coef(summary(lm(y~X-1)))[,"Pr(>|t|)"]

Upvotes: 6

Dirk is no longer here
Dirk is no longer here

Reputation: 368251

For this issue (of getting standard errors and hence p-values) I wrote three different versions of a function fastLm() in package RcppArmadillo, RcppEigen and RcppGSL. Part of it is of course also just for exposition. But you could start there. Make sure you use the fastLmPure() variants taking a vector and matrix and not the formula interface -- all the time is spent deparsing the formula.

Here, just for kicks, is the RcppArmadillo variant:

#include <RcppArmadillo.h>
using namespace Rcpp;

// [[Rcpp::export]]
List fastLm_impl(const arma::mat& X, const arma::colvec& y) {
    int n = X.n_rows, k = X.n_cols;

    arma::colvec coef = arma::solve(X, y);    // fit model y ~ X
    arma::colvec res  = y - X*coef;           // residuals

    // std.errors of coefficients
    double s2 = std::inner_product(res.begin(), res.end(), 
                                   res.begin(), 0.0)/(n - k);

    arma::colvec std_err = 
         arma::sqrt(s2 *
                    arma::diagvec(arma::pinv(arma::trans(X)*X)));  

    return List::create(Named("coefficients") = coef,
                        Named("stderr")       = std_err,
                        Named("df.residual")  = n - k);
}

Upvotes: 2

Related Questions