Lasso Regression glmnet - error regarding the input data

Question

I try to fit a Lasso regression model using glmnet(). As I have never worked with Lasso regression before, I tried to get along with tutorials but when applying the model, it always results with the following error:

Error in lognet(x, is.sparse, ix, jx, y, weights, offset, alpha, nobs,: 
one multinomial or binomial class has 1 or 0 observations; not allowed

Working with the dataset from this question (https://stats.stackexchange.com/questions/72251/an-example-lasso-regression-using-glmnet-for-binary-outcome) it seems that the dependent variable, the y, has to consist only of 0 and 1. Whenever I set one of the observation values of y to 2 or anything else than 0 or 1, it results in this error.

This is my code:

lambdas_to_try <- 10^seq(-3, 5, length.out = 100)

x_vars <- as.matrix(data.frame(data$x1, data$x2, data$x3))
lasso_cv <- cv.glmnet(x_vars, y=as.factor(data$y), alpha = 1, lambda = lambdas_to_try, family = "binomial", nfolds = 10)

x_vars_2 <- model.matrix(data$y ~ data$x1 + data$x2 + data$x3)[, -1]
lasso_cv_2 <- cv.glmnet(x_vars, y=as.factor(data$y), alpha = 1, lambda = lambdas_to_try, family = "binomial", nfolds = 10)

And this is how my dataset looks like:

The problem is, that in my data, the y variable represents the number of crimes, so it has integer values between 0 and 1000. I cannot set the value to 0 and 1 only. How does it work to use these data to apply a Lasso regression?

StupidWolf · Accepted Answer

As @Gregor noted, what you have is count data, and it should be regression and not classification. Using an example dataset, this is how you can implement it:

library(MASS)
library(glmnet)
data(Insurance)

Your response variable should be numeric:

str(Insurance)
'data.frame':   64 obs. of  5 variables:
 $ District: Factor w/ 4 levels "1","2","3","4": 1 1 1 1 1 1 1 1 1 1 ...
 $ Group   : Ord.factor w/ 4 levels "<1l"<"1-1.5l"<..: 1 1 1 1 2 2 2 2 3 3 ...
 $ Age     : Ord.factor w/ 4 levels "<25"<"25-29"<..: 1 2 3 4 1 2 3 4 1 2 ...
 $ Holders : int  197 264 246 1680 284 536 696 3582 133 286 ...
 $ Claims  : int  38 35 20 156 63 84 89 400 19 52 ...

Now we set the predictors and response variables:

y = Insurance$Claims
X = model.matrix(Claims ~ .,data=Insurance)

Run a cv to find the best lambda (if you don't know your L1 norm):

fit = cv.glmnet(x=X,y=y,family="poisson")
pred = predict(fit,X,s=fit$lambda.1se)

The prediction is in log scale, so to compare with your actual

plot(log(y),pred,xlab="log (actual)",ylab="log (predicted)")

Lasso Regression glmnet - error regarding the input data

Answers (1)

Related Questions