Edison Lin
Edison Lin

Reputation: 25

if (any(const_vars)) missing value where TRUE/FALSE needed error while running Lasso in R

When I tried to run a Lasso Regression on my 51th response variable with the other 50 variables, I got the following error message:

lasso_now=cv.glmnet(x=as.matrix(scaledData[,-51]),y=as.matrix(scaledData[,51]),alpha=1,nfolds = 5,type.measure="mse",family = binomial(link = "logit"))

My response variable is either 0 or 1 so I used logistic regression. My x has either categorical or numerical variables.

Does anyone why it happened or is there any way to validate the data for the issue? Thanks in advance!

Upvotes: 1

Views: 591

Answers (1)

StupidWolf
StupidWolf

Reputation: 46968

Check if you have NA values, you get the error because glmnet checks where any of your columns have standard deviation of zero. For example, we set one 1st entry of fourth column to be NA in the following dataset:

library(glmnet)

scaledData = data.frame(v1 = rnorm(100),v2=rnorm(100),
v3 = rbinom(100,1,0.5),v4 = rbinom(100,1,0.7))

scaledData[1,4] = NA

You can check:

glmnet:::weighted_mean_sd(as.matrix(scaledData[,-3]))
$mean
        v1         v2         v4 
0.03979154 0.14547529         NA 

$sd
       v1        v2        v4 
0.8544635 1.0815797        NA 

Runs with the same error:

lasso_now=cv.glmnet(x=as.matrix(scaledData[,-3]),
y=as.matrix(scaledData[,3]),
alpha=1,nfolds = 5,type.measure="mse",
family = binomial(link = "logit"))

Error in if (any(const_vars)) { : missing value where TRUE/FALSE needed

One way you can remove is like this:

scaledData = scaledData[complete.cases(scaledData),]

And run it, note that for binomial you should not use "mse", you can use "deviance", "class" or "auc".

lasso_now=cv.glmnet(x=as.matrix(scaledData[,-3]),
y=as.matrix(scaledData[,3]),
alpha=1,nfolds = 5,type.measure="deviance",
family = binomial(link = "logit"))

lasso_now

Call:  cv.glmnet(x = as.matrix(scaledData[, -3]), 
y = as.matrix(scaledData[,3]), 
type.measure = "deviance", nfolds = 5, alpha = 1, 
family = binomial(link = "logit")) 

Measure: GLM Deviance 

     Lambda Index Measure      SE Nonzero
min 0.07643     1   1.427 0.01681       0
1se 0.07643     1   1.427 0.01681       0

Upvotes: 1

Related Questions