Reputation: 65
I am having some issues to run 10-fold cross-validation for logistic regression in R.
I used cv.glm()
function, but it showed error. However, I used this function for Smarket data from ISLR package and it did not show any error. The predictors in my logistic regression are binary.
# 10-Fold Cross-Validation for Logistic Regression
cv.errorlog7 <- cv.glm(p, logit7, K=10)$delta[1]
I got the following error message:
Error in model.frame.default(Terms, newdata, na.action = na.action, xlev = object$xlevels) :
factor gender has new levels Other
In addition: Warning messages:
1: In predict.lm(object, newdata, se.fit, scale = 1, type = if (type == :
prediction from a rank-deficient fit may be misleading
2: In predict.lm(object, newdata, se.fit, scale = 1, type = if (type == :
prediction from a rank-deficient fit may be misleading
3: In y - yhat :
longer object length is not a multiple of shorter object length
Upvotes: 0
Views: 1137
Reputation: 11
I encountered a very similar error:
> set.seed(100)
> cv.lm(data = catering1, form.lm = model, m=3) # 3 fold cross-validation
Error in model.frame.default(Terms, newdata, na.action = na.action, xlev = object$xlevels) :
factor Month has new levels July
# Reset seed
> set.seed(1000)
> cv.lm(data = catering1, form.lm = model, m=3) # 3 fold cross-validation
Error in model.frame.default(Terms, newdata, na.action = na.action, xlev = object$xlevels) :
factor Month has new levels July
As you can see, I even reset the seed and tried again. No luck. However, when I increased the folds (I just kept increasing by 1 until I got a response) to 5, the code worked. However I did get an error and a warning.
> cv.lm(data = catering, form.lm = model, m=5) # 5 fold cross-validation
Response.... Anova table....
Error in which.min(xval) :
'list' object cannot be coerced to type 'double'
In addition: Warning message:
In cv.lm(data = catering, form.lm = model, m = 5) :
As there is >1 explanatory variable, cross-validation
predicted values for a fold are not a linear function
of corresponding overall predicted values. Lines that
are shown for the different folds are approximate
So, I would try try increasing the folds. Especially since you have a relatively small dataset, it shouldn't impact performance too much.
Upvotes: 1