Charlie
Charlie

Reputation: 2851

'optimize' not finding variables inside a function call

I have the following function that draws some data from a chi-squared distribution and compares the distribution of X to a known chi-squared distribution using maximum likelihood. This procedure is simulated nSims times. (I compare these results to results from a permutation test, but that code is excluded.)

chi2c <- function(xdf=2, yObs=100, xObs=100, nSims=1000, nPerm=500, alpha=0.05){
  simResults <- sapply(1:nSims, function(x){
    # Draw variables
    x  <- rchisq(xObs, df=xdf)
    # Other variables not relevant here
    # [[snip]]

    # Permutation test
    # [[snip]]

    # Calculate the statistics necessary for maximum likelihood
    n       <<- length(x)
    sumlx   <<- sum(log(x))
    sumx    <<- sum(x)
    # Calculate the maximum likelihood estimate
    dfhat   <- optimize(f=c2ll, interval=c(1, 10), maximum=TRUE)$maximum
    # Calculate the test statistic: -2 times the log likelihood ratio
    llr     <- -2 * (c2ll(2) - c2ll(dfhat))
    # Compare the test statistic to its asymptotic dist: chi-squared
    lReject <- llr > qchisq(1 - alpha, df=1)

    # Provide the results
    # [[snip]]
  })
  # Calcuate means across simulations
  rowMeans(simResults)
}

This function calls c2ll, the chi-squared likelihood function

c2ll <- function(dfHat){
  -n * log(gamma(dfHat/2)) - n * (dfHat/2) * log(2) + 
  (dfHat/2 - 1) * sumlx - sumx/2
}

This function does just what I would like and is accurate, but I don't understand why I have to set the maximum likelihood statistics (n, sumlx, and sumx) globally to get it to work; optimize doesn't find them if I only set them inside the function using <-. I tried setting them inside of optimize, but that didn't work either. Thanks for your help.

Charlie

Upvotes: 2

Views: 378

Answers (3)

Ian Fellows
Ian Fellows

Reputation: 17348

R has lexical scoping, which means that functions look for variables in the environment in which they were defined. c2ll is defined in the global environment, so it doesn't see your definitions of n, sumx, and sum1x inside the function. S on the other hand uses dynamic scoping, which behaves as you expect (i.e. it looks for variables in the scope in which it was called). Computer scientists generally believe that dynamic scoping was a dead-end bad idea, and that lexical scoping is the way to go.

As a practical matter, what can you do about this?

Well, there are a couple options...

First, you can define your function locally:

n       <- length(x)
sumlx   <- sum(log(x))
sumx    <- sum(x)
c2ll <- function(dfHat){
    -n * log(gamma(dfHat/2)) - n * (dfHat/2) * log(2) + (dfHat/2 - 1) * sumlx - sumx/2
}
dfhat   <- optimize(f=c2ll, interval=c(1, 10), maximum=TRUE)$maximum

Second, you can have c2ll take additional parameters, and pass those parameters through optimize.

#in global env
c2ll <- function(dfHat,n,sum1x,sumx){
  -n * log(gamma(dfHat/2)) - n * (dfHat/2) * log(2) + 
  (dfHat/2 - 1) * sumlx - sumx/2
}

#...
#in function
n       <- length(x)
sumlx   <- sum(log(x))
sumx    <- sum(x)
dfhat   <- optimize(f=c2ll, interval=c(1, 10), n=n,sum1x=sum1x,sumx=sumx, maximum=TRUE)$maximum

Both are clean options that preserve the encapsulation of your functions.

Upvotes: 2

VitoshKa
VitoshKa

Reputation: 8523

Your problem stems from the lexical scoping rules that R uses. See more in the language definitions manual. In short your function c2ll is looking for the variables in the environment it was defined.

To avoid that problem you have to pass the n, sum1x and sum2x explicetly as arguments to your function, or define your function locally in the ch2c function directly.

This is quite a common question, there are a lot of interesting examples on SO.

Upvotes: 1

IRTFM
IRTFM

Reputation: 263301

Your simResults function returns a logical vector. Rather than using rowMeans, just use mean(simResults) and the results look reasonably sensible ... at least to the extent that:

> chi2c(alpha=0.05)
[1] 0.057
> chi2c(alpha=0.5)
[1] 0.503

Upvotes: 1

Related Questions