Reputation: 35
I'm trying to write a function that will perform k-fold cross validation on a logistic regression model using different thresholds. By thresholds I mean with what probability the models output is turned into a prediction of a 1 or 0. For instance, using a threshold of .4, a probability of .42 would be coded as a prediction of 1.
To run cross validation using logistic regression, I need to create my own cost function (the default calculates the MSE) and feed it to the cv.glm() function. The function below will work if I use a static threshold, but I want the threshold to change in each loop, so I embedded my cost function inside my loop. I'm getting an error 'object i not found'. Is there a way I can create a new function inside a function using arguments not specified in the embedded function?
logit.CV<-function(data, model, K, firstThreshold, lastThreshold) {
error<-NULL
for(i in seq_along(firstThreshold:lastThreshold) {
costFunction<-function(y, pred) {
pred<-ifelse(pred > (i+firstThreshold-1)/10, 1, 0)
mean(abs(y-pred) > .5)
}
error[i]<-cv.glm(amData, logit.mod, cost=costFunction, K=10)$delta[1]
}
print(error)
}
Upvotes: 1
Views: 1622
Reputation: 206187
It doesn't look like there is anything inherently wrong with doing that. This example seems to work
runner<-function(f, n) f(n)
for(i in 1:10) {
pepper<-function(n) {
rep(n,i)
}
print(runner(pepper, letters[i]))
}
So it must have something specific do to with the way cv.glm
is calling the function. What about
logit.CV<-function(data, model, K, firstThreshold, lastThreshold) {
error<-NULL
getCostFunction<-function(i) {
function(y, pred) {
pred<-ifelse(pred > (i+firstThreshold-1)/10, 1, 0)
mean(abs(y-pred) > .5)
}
}
for(i in seq_along(firstThreshold:lastThreshold) {
error[i] <- cv.glm(amData, logit.mod, cost=getCostFunction(i), K=10)$delta[1]
}
print(error)
}
If that still doesn't work, perhaps you can make a reproducible example using test data in the package so other's can actually run it and try it out.
Upvotes: 1