Is there an R function that performs LASSO regression on multiple imputed datasets and pools results together?

Question

I have a dataset with 283 observation of 60 variables. My outcome variable is dichotomous (Diagnosis) and can be either of two diseases. I am comparing two types of diseases that often show much overlap and i am trying to find the features that can help differentiate these diseases from each other. I understand that LASSO logistic regression is the best solution for this problem, however it can not be run on a incomplete dataset.

So i imputed my missing data with MICE package in R and found that approximately 40 imputations is good for the amount of missing data that i have.

Now i want to perform lasso logistic regression on all my 40 imputed datasets and somehow i am stuck at the part where i need to pool the results of all these 40 datasets.

The with() function from MICE does not work on .glmnet

# Impute database with missing values using MICE package:

imp<-mice(WMT1, m = 40)

#Fit regular logistic regression on imputed data

imp.fit <- glm.mids(Diagnosis~., data=imp, 
                    family = binomial)
# Pool the results of all the 40 imputed datasets:

summary(pool(imp.fit),2)

The above seems to work fine with logistic regression using glm(), but when i try the exact above to perform Lasso regression i get:

# First perform cross validation to find optimal lambda value:

CV <- cv.glmnet(Diagnosis~., data = imp,
                     family = "binomial", alpha = 1, nlambda = 100)

When i try to perform cross validation I get this error message:

 Error in as.data.frame.default(data) : 
    cannot coerce class ‘"mids"’ to a data.frame

Can somebody help me with this problem?

Is there an R function that performs LASSO regression on multiple imputed datasets and pools results together?

Answers (1)

Related Questions