Reputation: 8828
I am trying to use LASSO for variable selection, and attempted the implementation in R using the glmnet
package. This is the code I wrote so far:
set.seed(1)
library(glmnet)
return = matrix(ret.ff.zoo[which(index(ret.ff.zoo) == beta.df$date[1]),])
data = matrix(unlist(beta.df[which(beta.df$date == beta.df$date[1]),][,-1]), ncol = num.factors)
dimnames(data)[[2]] <- names(beta.df)[-1]
model <- cv.glmnet(data, return, standardize = TRUE)
coef(model)
This is what I obtain when I run it the first time:
> coef(model)
15 x 1 sparse Matrix of class "dgCMatrix"
1
(Intercept) 0.009159452
VAL .
EQ .
EFF .
SIZE 0.018479078
MOM .
FSCR .
MSCR .
SY .
URP .
UMP .
UNIF .
OIL .
DEI .
PROD .
BUT, this is what I obtain when I run the SAME code once more:
> coef(model)
15 x 1 sparse Matrix of class "dgCMatrix"
1
(Intercept) 0.008031915
VAL .
EQ .
EFF .
SIZE 0.021250778
MOM .
FSCR .
MSCR .
SY .
URP .
UMP .
UNIF .
OIL .
DEI .
PROD .
I am not sure why the model behaves this way. How would I be able to choose a final model if the coefficients change at every run? Does it use a different tuning parameter $\lambda$ at every run? I thought that cv.glmnet
uses model$lambda.1se
by default?!
I have just started learning about this package, and would appreciate any help I can get!
Thank you!
Upvotes: 1
Views: 1366
Reputation: 351
Just a supplement to the answer of @nograpes. Each time before fitting the model, the same seed should be set. In short, one seed is only available for one model. For example,
set.seed(1)
model1 = cv.glmnet(x, y, alpha = 0, family = 'binomial')
model2 = cv.glmnet(x, y, alpha = 0, family = 'binomial')
For the code above, the coefficients of model1 and model2 could be different.
set.seed(1)
model1 = cv.glmnet(x, y, alpha = 0, family = 'binomial')
set.seed(1)
model2 = cv.glmnet(x, y, alpha = 0, family = 'binomial')
Only after you set the same seed before fitting the model, the result are exactly the same.
Upvotes: 1
Reputation: 31
You need to feed the same nfolds
and foldid
to both models. Check help(cv.glmnet)
for more details. This will make the cross-validation is identical and you should get the same model if you run the models on the same data-set.
Upvotes: 1
Reputation: 18323
The model isn't deterministic. Run set.seed(1)
before your model fit to produce deterministic results.
Upvotes: 5