Sam G
Sam G

Reputation: 25

Error including correlation structure in function with gamm

I am trying to create my own function that contains 1.) the mgcv gamm function and 2.) a nested autocorrelation (ARMA) argument. I am getting an error when I try to run the function like this:

df <- AirPassengers
df <- as.data.frame(df)
df$month <- rep(1:12)
df$yr <- rep(1949:1960,each=12)
df$datediff <- 1:nrow(df)

try_fxn1 <- function(dfz, colz){gamm(dfz[[colz]] ~ s(month, bs="cc",k=12)+s(datediff,bs="ts",k=20), data=dfz,correlation = corARMA(form = ~ 1|yr, p=2))}

try_fxn1(df,"x")

Error in eval(predvars, data, env) : object 'dfz' not found

I know the issue is with the correlation portion of the formula, as when I run the same function without the correlation structure included (as seen below), the function behaves as expected.

try_fxn2 <- function(dfz, colz){gamm(dfz[[colz]] ~ s(month, bs="cc",k=12)+ s(datediff,bs="ts",k=20), data=dfz)}

try_fxn2(df,"x")

Any ideas on how I can modify try_fxn1 to make the function behave as expected? Thank you!

Upvotes: 2

Views: 293

Answers (1)

Gavin Simpson
Gavin Simpson

Reputation: 174813

You are confusing a vector with the symbolic representation of that vector when building a formula.

You don't want dfz[[colz]] as the response in the formula, you want x or whatever you set colz too. What you are getting is

dfz[[colz]] ~ ...

when what you really want is the variable colz:

colz ~ ...

And you don't want a literal colz but whatever colz evaluates to. To do this you can create a formula by pasting the parts together:

fml <- paste(colz, '~ s(month, bs="cc", k=12) + s(datediff,bs="ts",k=20)')

This turns colz into whatever it was storing, not the literal colz:

> fml
[1] "x ~ s(month, bs=\"cc\", k=12) + s(datediff,bs=\"ts\",k=20)"

Then convert the string into a formula object using formula() or as.formula().

The final solution then is:

fit_fun <- function(dfz, colz) {
    fml <- paste(colz, '~ s(month, bs="cc", k=12) + s(datediff,bs="ts",k=20)')
    fml <- formula(fml)
    gamm(fml, data = df, correlation = corARMA(form = ~ 1|yr, p=2))
}

This really is not an issue with corARMA() part, other than that triggers somewhat different evaluation code for the formula. The guiding mantra here is to always get a formula as you would type it if not programming with formulas. You would never (or should never) write a formula like

gamm(df[[var]] ~ x + s(z), ....)

While this might work in some settings, it will fail miserably if you ever want to use predict()` and it fails when you have to do something a little more complicated.

Upvotes: 1

Related Questions