Reputation: 25
When I tried to run a Lasso Regression on my 51th response variable with the other 50 variables, I got the following error message:
lasso_now=cv.glmnet(x=as.matrix(scaledData[,-51]),y=as.matrix(scaledData[,51]),alpha=1,nfolds = 5,type.measure="mse",family = binomial(link = "logit"))
Error in if (any(const_vars)) { : missing value where TRUE/FALSE needed
My response variable is either 0 or 1 so I used logistic regression. My x has either categorical or numerical variables.
Does anyone why it happened or is there any way to validate the data for the issue? Thanks in advance!
Upvotes: 1
Views: 591
Reputation: 46968
Check if you have NA values, you get the error because glmnet checks where any of your columns have standard deviation of zero. For example, we set one 1st entry of fourth column to be NA in the following dataset:
library(glmnet)
scaledData = data.frame(v1 = rnorm(100),v2=rnorm(100),
v3 = rbinom(100,1,0.5),v4 = rbinom(100,1,0.7))
scaledData[1,4] = NA
You can check:
glmnet:::weighted_mean_sd(as.matrix(scaledData[,-3]))
$mean
v1 v2 v4
0.03979154 0.14547529 NA
$sd
v1 v2 v4
0.8544635 1.0815797 NA
Runs with the same error:
lasso_now=cv.glmnet(x=as.matrix(scaledData[,-3]),
y=as.matrix(scaledData[,3]),
alpha=1,nfolds = 5,type.measure="mse",
family = binomial(link = "logit"))
Error in if (any(const_vars)) { : missing value where TRUE/FALSE needed
One way you can remove is like this:
scaledData = scaledData[complete.cases(scaledData),]
And run it, note that for binomial you should not use "mse", you can use "deviance", "class" or "auc".
lasso_now=cv.glmnet(x=as.matrix(scaledData[,-3]),
y=as.matrix(scaledData[,3]),
alpha=1,nfolds = 5,type.measure="deviance",
family = binomial(link = "logit"))
lasso_now
Call: cv.glmnet(x = as.matrix(scaledData[, -3]),
y = as.matrix(scaledData[,3]),
type.measure = "deviance", nfolds = 5, alpha = 1,
family = binomial(link = "logit"))
Measure: GLM Deviance
Lambda Index Measure SE Nonzero
min 0.07643 1 1.427 0.01681 0
1se 0.07643 1 1.427 0.01681 0
Upvotes: 1