How does glmnet() handle with both penalized and unpenalized covariates?

Question

Is it possible to do a lasso model with both penalized and un-penalized covariates? That is, I want to do an estimate with Y ~ gamma * X + beta * Z, where X is a n*p penalized features and Z a n*q un-penalized covariates of continues or factor variables.

Thanks.

sahir · Accepted Answer

It is clearly stated in the vignette under the section called Penalty Factors. To ensure some variables are not penalized, set the penalty.factor to 0. You just need to create a vector of length ncol(X) + ncol(Z) where the first ncol(X) entries are 1 (or any positive non-zero number) and the other ncol(Z) entries are 0. For example:

set.seed(1234)
n = 100 # number of samples
px = 5 # number of x variables 
pz = 5 # number of z variables
x <- matrix(rnorm(n*px), ncol = px)
z <- matrix(rnorm(n*pz), ncol = pz)

y <- x[,1] + x[,5] + 2*z[,1] + 3*rnorm(n) # generate response
penalty <- c(rep(1, px), rep(0, pz)) # penalty factor

plot(glmnet::glmnet(cbind(x,z), y, penalty.factor = penalty))

Notice in the plot of the solution path, 5 of the variables are never 0 because they are never penalized.

How does glmnet() handle with both penalized and unpenalized covariates?

Answers (1)

Related Questions