Lasse Hjort Jakobsen
Lasse Hjort Jakobsen

Reputation: 41

Order of variables changes estimated coefficients in glmnet

I am working with the glmnet package in R and came across a problem when trying to reproduce an "old" classifier. If the explanatory variables are permuted (say in reverse order), the resulting coefficients from cv.glmnet are not equal to the coefficients using the unpermuted design matrix.

For example, consider the following data:

library(glmnet)
set.seed(1)

#Set initial parameters
n <- 100
p <- 1000

#Simulate data
x <- matrix(rnorm(n * p), nrow = n, ncol = p)
colnames(x) <- as.character(1:p)
beta <- rnorm(n = p, mean = 2, sd = 2)
beta[rbinom(p, size = 1, prob = 0.5) == 0] <- 0
y <- x %*% beta + rnorm(100, sd = 0.1)

Then run glmnet with the LASSO penalization (alpha = 1) for both the design matrix x and the permuted version of x.

#Set parameters for cross validation with cv.glmnet
lambda <- exp(seq(-1, 1, length.out = 100))
alpha <- 1
foldid <- rep(1:10, each = 10)

#Run cross validation
fit <- cv.glmnet(x = x, y = y, family = "gaussian", alpha = alpha, 
                 lambda = lambda, 
                 foldid = foldid)

#Save coefficients
coef1 <- as.matrix(coef(fit, s = "lambda.min"))

#Run cross validation with rearranged design matrix
order <- ncol(x):1
fit2 <- cv.glmnet(x = x[,order], y = y, family = "gaussian", alpha = alpha, 
                  lambda = lambda, 
                  foldid = foldid)

#Save coefficients
coef2 <- as.matrix(coef(fit2, s = "lambda.min"))
coef2 <- coef2[rownames(coef1),]

Then compare the coefficients, the cross validation error, and the linear predictors.

> summary(coef2 - coef1)
       1             
 Min.   :-0.2738963  
 1st Qu.: 0.0000000  
 Median : 0.0000000  
 Mean   : 0.0003739  
 3rd Qu.: 0.0000000  
 Max.   : 0.3643040

> min(fit$cvm)
[1] 4584.373
> min(fit2$cvm)
[1] 4596.626

> summary(cbind(1,x) %*% coef2 - cbind(1, x) %*% coef1)
       1          
 Min.   :-0.5100  
 1st Qu.:-0.1613  
 Median : 0.0210  
 Mean   : 0.0000  
 3rd Qu.: 0.1333  
 Max.   : 0.6139

For all three measures, we see a difference between the models while only the order of the variables has been changed. Can anybody explain this?

Upvotes: 4

Views: 1242

Answers (2)

Wouter
Wouter

Reputation: 21

Glmnet computes the LASSO regularization paths through coordinate descent (see e.g. slide 15 of this talk by Trevor Hastie: http://web.stanford.edu/~hastie/TALKS/glmnet.pdf). Since the algorithm iterates over the coefficients, the order of the variables affects the path taken. Depending on the convergence threshold and the maximum number of iterations, this can lead to differences in the final values of the coefficients. In the case of your example, try changing

fit <- cv.glmnet(x = x, y = y, family = "gaussian", alpha = alpha, 
                 lambda = lambda, 
                 foldid = foldid)

to e.g.

fit <- cv.glmnet(x = x, y = y, family = "gaussian", alpha = alpha, 
                 lambda = lambda, 
                 foldid = foldid, standardize=TRUE, thresh=1e-20, maxit=10^6)

and do the same for your fit2. This may take a minute or so to compute, but you will find the differences become negligibly small:

> summary(coef2 - coef1)
       1             
 Min.   :-2.038e-08  
 1st Qu.: 0.000e+00  
 Median : 0.000e+00  
 Mean   : 1.050e-10  
 3rd Qu.: 0.000e+00  
 Max.   : 3.028e-08  
> 
> min(fit$cvm)
[1] 4598.242
> 
> min(fit2$cvm)
[1] 4598.242
> 
> summary(cbind(1,x) %*% coef2 - cbind(1, x) %*% coef1)
       1             
 Min.   :-5.175e-08  
 1st Qu.:-1.457e-08  
 Median :-2.959e-10  
 Mean   : 0.000e+00  
 3rd Qu.: 1.503e-08  
 Max.   : 5.555e-08  

Upvotes: 2

Suspicious_Gardener
Suspicious_Gardener

Reputation: 126

I believe it is because glmnet uses coordinate descent, where the variables are iterated over to minimize the loss function. The order of the variables in that case would change the order of iteration, which changes the path taken to minimize the loss function.

Upvotes: 1

Related Questions