user321627
user321627

Reputation: 2572

How to correctly pass formulas associated with a variable name with random effects into fitted regression models in `R`?

I currently have a problem in that I have to pre-specify my formulas before sending them into a regression function. For example, using the stan_gamm4 function in R, we have the following example:

dat <- mgcv::gamSim(1, n = 400, scale = 2) ## simulate 4 term additive truth
## Now add 20 level random effect `fac'...
dat$fac <- fac <- as.factor(sample(1:20, 400, replace = TRUE))
dat$y <- dat$y + model.matrix(~ fac - 1) %*% rnorm(20) * .5

br <- stan_gamm4(y ~ s(x0) + x1 + s(x2), data = dat, random = ~ (1 | fac), 
                 chains = 1, iter = 200) # for example speed

Now, because the formula and random formula were specified explicitly, then if we call:

br$call$random
> ~(1 | fac)

We are able to retrieve the form of the random effects.

NOW, let us then leave everything the same, BUT use an expression for the random part:

formula.rand <- as.formula( '~(1|fac)' )

Then, if we did the same thing before, but with formula.rand taking the place, we have:

br <- stan_gamm4(y ~ s(x0) + x1 + s(x2), data = dat, random = formula.rand, 
                     chains = 1, iter = 200) # for example speed

BUT NOW: we have that:

br$call$random
> formula.rand

Instead of the original. A lot of bayesian packages rely on accessing br$call$random, so is there a way to use a variable for formula, have it pass in, AND retain the original relation when calling br$call$random? Thanks.

Upvotes: 2

Views: 202

Answers (3)

Hong Ooi
Hong Ooi

Reputation: 57696

While I haven't used Stan, this is a problem inherent in the way that R handles storing calls. You can see it happening with lm, for example:

model <- function(formula)
{
    lm(formula, data=mtcars)
}
m <- model(mpg ~ disp)
m$call$formula
# formula

The simplest solution is to construct the call using substitute to insert the actual values you want to keep, not the symbol name. In the case of lm, this would be something like

model2 <- function(formula)
{
    call <- substitute(lm(formula=.f, data=mtcars), list(.f=formula))
    eval(call)
}
m2 <- model2(mpg ~ disp)
m2$call$formula
# mpg ~ disp

For Stan, you can do

stan_call <- substitute(br <- stan_gamm4(y ~ s(x0) + x1 + s(x2), data=dat, random=.rf,
                 chains=1, iter=200),
             list(.rf=formula.rand))
br <- eval(stan_call)

Upvotes: 1

Ben Goodrich
Ben Goodrich

Reputation: 4990

IMHO, this is not a problem with stan_gamm4. In your second example, if you then do

class(br$call$random)

you will see that it is of class "name". So, it is not as if $call is just some list with stuff in it. In order to access it programatically in general, you need to evaluate that with

eval(br$call$random)

in order to obtain ~(1 | fac), which is of class "formula".

Upvotes: 1

Thomas K
Thomas K

Reputation: 3311

If I understand correctly, your problem is not, that stan_gamm4 could be computing incorrect results (which is not the case, from what I gather), but only that br$call$random refers to the variable name and not the formula. This seems to be problematic for further post-processing of the model.

Since stan_gamm4 uses match.call inside to find the call, I don't know of a way to specify the model differently to obtain a "correct" br$call$random up front. But you can simply modify it after the fact via:

br <- stan_gamm4(y ~ s(x0) + x1 + s(x2), data = dat, random = formula.rand) 

br$call$random <- formula.rand
br$call$random
#> ~(1 | fac)

and the continue with whatever you are doing.

Upvotes: 1

Related Questions