John
John

Reputation: 312

glm.nb link function in MASS package not working when using variable within function call?

When I use an object that stores the link function for the MASS package's glm.nb function, the function errors out. However, when I just hard-code the link function, it works fine. I think this has to be due to some sort of environmental referencing gone wrong in their source code, but wanted to be certain before emailing the authors. So, any insight is appreciated here.

MWE

# Simulate some data
set.seed(666)
n = 1000
x = cbind(1, rnorm(n))
b = c(1,-2)
lam = exp(x%*%b)
y = rnbinom(n, size=.3, mu = lam)

# get package
if(!("MASS" %in% rownames(installed.packages())) 
    install.packages("MASS")
library(MASS)

Attempt 1 -- Define link as parameter default

# Define some wrapper function around glm.nb

mfn = function(y, x, mylink = "log"){
  
  MASS::glm.nb(y ~ x - 1, link=mylink)
  
}


# Call mfn on simulated data

mfn(y, x)

> Error in poisson(link = mylink) : object 'mylink' not found


# Call with explicit link in mfn

mfn(y = y, x = x, mylink="log")

>  Error in poisson(link = mylink) : object 'mylink' not found 


Attempt 2 -- Modifying mfn to define mylink in function environment

# Redefine function

mfn = function(y, x){
  
  mylink = "log" 

  MASS::glm.nb(y ~ x - 1, link=mylink)
  
}


# Call on simulated data

mfn(y = y, x = x)

> Error in poisson(link = mylink) : object 'mylink' not found

Attempt 3 Combine (1) and (2)

# Redefine mfn

mfn = function(y, x, mylink = "log"){
    
  mylink = "log"
   
  MASS::glm.nb(y ~ x - 1, link = mylink)
  
}


mfn(y = y, x = x, mylink="log")

> Error in poisson(link = mylink) : object 'mylink' not found 

Just to be sure -- Hard-code link function

mfn = function(y, x){
  
  mylink = "log"
  MASS::glm.nb(y ~ x - 1, link="log")
  
}

mfn(y = y, x = x)

> Call:  MASS::glm.nb(formula = y ~ x - 1, link = "log", init.theta = 0.2953628307)

Coefficients:
     x1       x2  
 0.9415  -1.9165  

Degrees of Freedom: 1000 Total (i.e. Null);  998 Residual
Null Deviance:      6191 
Residual Deviance: 853.8    AIC: 4248

So its pretty clear that there is some sort of referencing issue going on, but I can't immediately find it in the source code nor do I know of a simpler way to test the issue. Any ideas?

Upvotes: 2

Views: 887

Answers (2)

John
John

Reputation: 312

Okay so I've reached out to the MASS package maintainers, and Bill Venables kindly responded with a working solution for me. For completeness, I've pasted it below:

### negative binomial with parametric link function

ngb <- function(formula, data = list(), link) {
  stopifnot(require(MASS))
  thisCall <- substitute(MASS::glm.nb(formula = FORM, data = DATA, link = LINK),
                         list(FORM = substitute(formula),
                              DATA = substitute(data),
                              LINK = link))
  eval.parent(thisCall)
}

library(MASS)
set.seed(5678)
N <- 1000
dummy_data <- within(data.frame(x = rnorm(N)), {
  eta <- 1 - 2*x
  mu <- exp(eta)
  y <- rnegbin(eta, mu, theta = 3)
}) 

tst1 <- ngb(y ~ x, dummy_data, link = "log")

attach(dummy_data)
tst2 <- ngb(y ~ x, link = "log")
detach()

tst3 <- ngb(y ~ poly(x, 2), dummy_data, link = "log")

rbind(coef(tst1), coef(tst2))

summary(tst3)

Upvotes: 1

Eric Leung
Eric Leung

Reputation: 2594

Try the assignment operator (<-) instead of the equals sign (=) when you set the function to the name mfn.

It appears that the MASS::glm.nb function does some non-standard evaluation (NSE) on the link parameter. If you look at the documentation, the input is an unquoted log, with other options as sqrt and identity. Again note the missing quotes around them.

Without getting too into NSE, I think the simplest solution is to use ... as a parameter and just pass in the link argument again. This should still give you the flexibility to change your model and add more parameters if you so choose.

set.seed(666)
n = 1000
x = cbind(1, rnorm(n))
b = c(1,-2)
lam = exp(x%*%b)
y = rnbinom(n, size=.3, mu = lam)

mfn <- function(y, x, ...){
  MASS::glm.nb(y ~ x - 1, ...)
}

# Call mfn on simulated data
mfn(y, x, link = log)

which outputs

Call:  MASS::glm.nb(formula = y ~ x - 1, link = log, init.theta = 0.2953628307)

Coefficients:
     x1       x2  
 0.9415  -1.9165  

Degrees of Freedom: 1000 Total (i.e. Null);  998 Residual
Null Deviance:      6191 
Residual Deviance: 853.8    AIC: 4248

As a side note in case you want to see how the MASS::glm.nb function is doing this object manipulation, you can see it in the first few lines of the source code by typing in the function name without parentheses.

MASS::glm.nb
# function (formula, data, weights, subset, na.action, start = NULL, 
#     etastart, mustart, control = glm.control(...), method = "glm.fit", 
#     model = TRUE, x = FALSE, y = TRUE, contrasts = NULL, ..., 
#     init.theta, link = log) 
# {
#     loglik <- function(n, th, mu, y, w) sum(w * (lgamma(th + 
#         y) - lgamma(th) - lgamma(y + 1) + th * log(th) + y * 
#         log(mu + (y == 0)) - (th + y) * log(th + mu)))
#     link <- substitute(link)  <--- This is the line that is messing things up
#     ...

Upvotes: 1

Related Questions