Woody Harelson
Woody Harelson

Reputation: 35

R embed function within function

I'm trying to write a function that will perform k-fold cross validation on a logistic regression model using different thresholds. By thresholds I mean with what probability the models output is turned into a prediction of a 1 or 0. For instance, using a threshold of .4, a probability of .42 would be coded as a prediction of 1.

To run cross validation using logistic regression, I need to create my own cost function (the default calculates the MSE) and feed it to the cv.glm() function. The function below will work if I use a static threshold, but I want the threshold to change in each loop, so I embedded my cost function inside my loop. I'm getting an error 'object i not found'. Is there a way I can create a new function inside a function using arguments not specified in the embedded function?

logit.CV<-function(data, model, K, firstThreshold, lastThreshold) {
    error<-NULL

for(i in seq_along(firstThreshold:lastThreshold) {   

    costFunction<-function(y, pred) {
        pred<-ifelse(pred > (i+firstThreshold-1)/10, 1, 0)
        mean(abs(y-pred) > .5) 
    }

error[i]<-cv.glm(amData, logit.mod, cost=costFunction, K=10)$delta[1]

}

print(error)

}

Upvotes: 1

Views: 1622

Answers (1)

MrFlick
MrFlick

Reputation: 206187

It doesn't look like there is anything inherently wrong with doing that. This example seems to work

runner<-function(f, n) f(n)

for(i in 1:10) {
   pepper<-function(n) {
       rep(n,i)
   }
   print(runner(pepper, letters[i]))
}

So it must have something specific do to with the way cv.glm is calling the function. What about

logit.CV<-function(data, model, K, firstThreshold, lastThreshold) {
    error<-NULL

    getCostFunction<-function(i) {
        function(y, pred) {
            pred<-ifelse(pred > (i+firstThreshold-1)/10, 1, 0)
            mean(abs(y-pred) > .5) 
        }
    }

    for(i in seq_along(firstThreshold:lastThreshold) {   
        error[i] <- cv.glm(amData, logit.mod, cost=getCostFunction(i), K=10)$delta[1]
    }

    print(error)
}

If that still doesn't work, perhaps you can make a reproducible example using test data in the package so other's can actually run it and try it out.

Upvotes: 1

Related Questions