Sacha Epskamp
Sacha Epskamp

Reputation: 47551

How does glmnet compute the maximal lambda value?

The glmnet package uses a range of LASSO tuning parameters lambda scaled from the maximal lambda_max under which no predictors are selected. I want to find out how glmnet computes this lambda_max value. For example, in a trivial dataset:

set.seed(1)
library("glmnet")
x <- matrix(rnorm(100*20),100,20)
y <- rnorm(100)
fitGLM <- glmnet(x,y)
max(fitGLM$lambda)
# 0.1975946

The package vignette (http://www.jstatsoft.org/v33/i01/paper) describes in section 2.5 that it computes this value as follows:

sx <- as.matrix(scale(x))
sy <- as.vector(scale(y))
max(abs(colSums(sx*sy)))/100
# 0.1865232

Which clearly is close but not the same value. So, what causes this difference? And in a related question, how could I compute lambda_max for a logistic regression?

Upvotes: 10

Views: 7764

Answers (5)

Gaby Cohen
Gaby Cohen

Reputation: 116

To get the same result you need to standardize the variables using a standard deviation with n instead of n-1 denominator.

mysd <- function(y) sqrt(sum((y-mean(y))^2)/length(y))
sx <- scale(x,scale=apply(x, 2, mysd))
sx <- as.matrix(sx, ncol=20, nrow=100)
sy <- as.vector(scale(y, scale=mysd(y)))
max(abs(colSums(sx*sy)))/100
## [1] 0.1758808
fitGLM <- glmnet(sx,sy)
max(fitGLM$lambda)
## [1] 0.1758808

For the unscaled (original) x and y, the maximum lambda should be

mysd <- function(y) sqrt(sum((y-mean(y))^2)/length(y))
sx <- scale(x,scale=apply(x, 2, mysd))
norm(t(sx) %*% y, 'i') / nrow(x) 
## [1] 0.1975946
# norm of infinity is also equal to 
max(abs(colSums(sx*y)))/100
## [1] 0.1975946
max(fitGLM$lambda) - norm(t(sx) %*% y, 'i') / nrow(x)
## [1] 2.775558e-17

Upvotes: 10

Can
Can

Reputation: 11

Sorry, been a while, but maybe still of help:

You can calculate the maximum lambda value for any problem with L1-regularization by finding the highest absolute value of the gradient of the objective function (i.e. the score function for likelihoods) at the optimized parameter values for the completely regularized model (eg. all penalized parameters set to zero).

I sadly can't help with the difference in values, though. Although I can say that I try to use a max lambda value that is a bit higher - say 5% - than the calculated maximum lambda, so that the model with all selected parameterers constrained will surely be a part of the number of estimated models. Maybe this is what is being done in glmnet.

Edit: sorry, I confused the non-regularized with the fully penalized model. Edited it above now.

Upvotes: 1

Marjolein Fokkema
Marjolein Fokkema

Reputation: 626

It seems lambda_max for a logistic regression is calculated similarly as for linear regression, but with weights based on class proportions:

set.seed(1)
library("glmnet")
x <- matrix(rnorm(100*20),100,20)
y <- rnorm(100)

mysd <- function(y) sqrt(sum((y-mean(y))^2)/length(y))
sx <- scale(x, scale=apply(x, 2, mysd))
sx <- as.matrix(sx, ncol=20, nrow=100)

y_bin <- factor(ifelse(y<0, -1, 1))
prop.table(table(y_bin)) 
# y_bin
#   -1    1 
# 0.62 0.38 
fitGLM_log <- glmnet(sx, y_bin, family = "binomial")
max(fitGLM_log$lambda)
# [1] 0.1214006
max(abs(colSums(sx*ifelse(y<0, -.38, .62))))/100
# [1] 0.1214006

Upvotes: 5

TreeStump
TreeStump

Reputation: 63

For your second question, look to Friedman et al's paper, "Regularization paths for generalized linear models via coordinate descent". In particular, see equation (10), which is equality at equilibrium. Just check under what conditions the numerator $S(\cdot,\cdot)$ is zero for all parameters.

Upvotes: 3

Roland
Roland

Reputation: 132706

According to help("glmnet") the maximal lambda value is "the smallest value for which all coefficients are zero":

sum(fitGLM$beta[, which.max(fitGLM$lambda)])
#[1] 0
sum(glmnet(x,y, lambda=max(fitGLM$lambda)*0.999)$beta)
#[1] -0.0001809804

At a quick glance the value seems to be calculated by the Fortran code called by elnet.

Upvotes: 0

Related Questions