Reputation: 169
I am running regression models with the function cv.glmnet()
. The argument standardize = TRUE
standardises all x variables (predictors) prior to fitting the model. However, the coefficients are always returned on the original scale for the output / result.
Is there a way of receiving standardized coefficients (beta weights) for the output, so that coefficients are comparable?
Upvotes: 4
Views: 2047
Reputation: 46908
When you standardize or scale, you do (x - mean(x))/sd(x). When regression is done on this, the centering part (- mean(x) ) , goes into the intercept, so only the standard deviate affects your coefficient.
To go from the unscaled coefficients to scaled, you can multiply by the standard deviation.
We can check this, first the regression on scaled x variables:
scaled_mt = mtcars
scaled_mt[,-1] = scale(scaled_mt[,-1])
fit_scaled = lm(mpg ~ .,data=scaled_mt)
The regression on original:
fit = lm(mpg ~ .,data=mtcars)
The glmnet, where I set very low lambda to include all terms:
fit_lasso = cv.glmnet(y=as.matrix(mtcars[,1]),x=as.matrix(mtcars)[,-1],lambda=c(0.0001,0.00001))
Standard deviation for all x variables:
AllSD = apply(mtcars[,-1],2,sd)
To show the transformation is ok:
cbind(scaled=coefficients(fit_scaled)[-1],
from_lm = coefficients(fit)[-1]*allSD,
from_glmnet = coefficients(fit_lasso)[-1]*allSD)
scaled from_lm from_glmnet
cyl -0.1990240 -0.1990240 -0.1762826
disp 1.6527522 1.6527522 1.6167872
hp -1.4728757 -1.4728757 -1.4677513
drat 0.4208515 0.4208515 0.4268243
wt -3.6352668 -3.6352668 -3.6071975
qsec 1.4671532 1.4671532 1.4601126
vs 0.1601576 0.1601576 0.1615794
am 1.2575703 1.2575703 1.2563485
gear 0.4835664 0.4835664 0.4922507
carb -0.3221020 -0.3221020 -0.3412025
But note, this does not necessary make them comparable, because they are scaled by standard deviation. The more important purpose of scaling is to center them, so you can interpret positive or negative relationships more easily.
Upvotes: 7