fitrat
fitrat

Reputation: 65

Cross-validation for logistic regression

I am having some issues to run 10-fold cross-validation for logistic regression in R.

I used cv.glm() function, but it showed error. However, I used this function for Smarket data from ISLR package and it did not show any error. The predictors in my logistic regression are binary.

# 10-Fold Cross-Validation for Logistic Regression
cv.errorlog7 <- cv.glm(p, logit7, K=10)$delta[1] 

I got the following error message:

Error in model.frame.default(Terms, newdata, na.action = na.action, xlev = object$xlevels) : 
  factor gender has new levels Other
In addition: Warning messages:
1: In predict.lm(object, newdata, se.fit, scale = 1, type = if (type ==  :
  prediction from a rank-deficient fit may be misleading
2: In predict.lm(object, newdata, se.fit, scale = 1, type = if (type ==  :
  prediction from a rank-deficient fit may be misleading
3: In y - yhat :
  longer object length is not a multiple of shorter object length

Upvotes: 0

Views: 1137

Answers (1)

Robert Yost
Robert Yost

Reputation: 11

I encountered a very similar error:

> set.seed(100)
> cv.lm(data = catering1, form.lm = model, m=3) # 3 fold cross-validation
Error in model.frame.default(Terms, newdata, na.action = na.action, xlev = object$xlevels) : 
  factor Month has new levels July
# Reset seed
> set.seed(1000)
> cv.lm(data = catering1, form.lm = model, m=3) # 3 fold cross-validation
Error in model.frame.default(Terms, newdata, na.action = na.action, xlev = object$xlevels) : 
  factor Month has new levels July

As you can see, I even reset the seed and tried again. No luck. However, when I increased the folds (I just kept increasing by 1 until I got a response) to 5, the code worked. However I did get an error and a warning.

> cv.lm(data = catering, form.lm = model, m=5) # 5 fold cross-validation
Response.... Anova table....
Error in which.min(xval) : 
  'list' object cannot be coerced to type 'double'
In addition: Warning message:
In cv.lm(data = catering, form.lm = model, m = 5) : 

 As there is >1 explanatory variable, cross-validation
 predicted values for a fold are not a linear function
 of corresponding overall predicted values.  Lines that
 are shown for the different folds are approximate

So, I would try try increasing the folds. Especially since you have a relatively small dataset, it shouldn't impact performance too much.

Upvotes: 1

Related Questions