Reputation:
I have a function to return the auc value for a cv.glmnet model and it often, although not the majority of the time, returns the following error when executing the cv.glmnet function:
Error in drop(y %% rep(1, nc)) : error in evaluating the argument 'x' in selecting a method for function 'drop': Error in y %% rep(1, nc) : non-conformable arguments
I've read a little bit about the error and the only suggestion I could find was to use data.matrix() instead of as.matrix(). My function is as follows (where "form" is a formula with my desired variables and "dt" is the data frame):
auc_cvnet <- function(form, dt, standard = F){
vars = all.vars(form)
depM = dt[[vars[1]]]
indM = data.matrix(dt[vars[-1]])
model = cv.glmnet(indM, depM, family = "binomial", nfolds=3, type.measure="auc", standardize = standard)
pred = predict(model, indM, type = "response")
tmp = prediction(pred, depM)
auc.tmp = performance(tmp, "auc")
return(as.numeric([email protected]))
}
I'm implementing this function in another function that iterates through combinations of a few variables to see what combinations of variables work well (it's a pretty brute-force method). Anyway, I printed out the formula for the iteration when the error was thrown and called the function with just that formula and it worked fine. So unfortunately I can't pinpoint what calls throw an error, otherwise I'd try to give more information. The data frame has about 30 rows and there are no errors when I run my code on a larger data set with 110 rows. There also are no NAs in either data set.
Has anyone seen this before or have any thoughts? Thanks!
Upvotes: 3
Views: 3229
Reputation: 148
I have the same problem when running cv.glmnet
on a dataset with 2 positive cases and 850 negative ones. In one of the cross-validation iterations (where the training and testing sets are randomly sampled) both positive cases are sampled-out of the training set. Thus, glmnet
calls lognet
, which in turn calls drop(y %*% rep(1, nc))
but y
is a vector and not a matrix with at least two columns.
The easiest way I can think of is to specify the foldid
parameter to cv.glmnet
and make sure that there are at least two classes present in the data in every iteration.
Upvotes: 1
Reputation: 91
Believe it or not, I actually got this same error today. Since I don't know your dataset, I can't say for sure what it is, but for me, the data I was passing as my y variable (your depM) was a column of all True values. cv.glmnet would only return a valid model if my y variable contained True and False values.
I wish I could explain why cv.glmnet required both True and False, but I have a lack of understanding of the function itself (as it is, I am only adapting code given to me). I just thought I'd post this in case it would give you some help troubleshooting. Good luck!
Upvotes: 9