Reputation: 23
I would like to know if there is another way to write the function:
gam(VariableResponse ~ s(CovariateName1) + s(CovariateName2) + ... + s(CovariateName100),
family = gaussian(link = identity), data = MyData)
in mgcv package without typing 100 covariates' name as above? Supposing that in MyData I have only VariableResponse in column 1, CovariateName1 in column 2, etc.
Many thank!
Upvotes: 0
Views: 1438
Reputation: 174813
Yes, use the brute force approach to generate a formula by pasting together the covariate names with the strings 's('
and ')'
and then collapsing the whole things with ' + '
. The convert the resultant string to a formula and pass that to gam()
. You may need to fix issues with the formula's environment if gam()
can't find the variable you name as it is going to do some NSE on the formula to identify which terms need smooths estimating and hence need to be replaced by a basis expansion.
library(mgcv)
set.seed(2) ## simulate some data...
df <- gamSim(1, n=400, dist = "normal", scale = 2)
> names(df)
[1] "y" "x0" "x1" "x2" "x3" "f" "f0" "f1" "f2" "f3"
We'll ignore the last 5 of those columns for the purposes of this example
df <- df[1:5]
Make the formula
fm <- paste('s(', names(df[ -1 ]), ')', sep = "", collapse = ' + ')
fm <- as.formula(paste('y ~', fm))
Now fit the model
m <- gam(fm, data = df)
> m
Family: gaussian
Link function: identity
Formula:
y ~ s(x0) + s(x1) + s(x2) + s(x3)
Estimated degrees of freedom:
2.5 2.4 7.7 1.0 total = 14.6
GCV score: 4.050519
You do have to be careful about fitting GAMs this way however; concurvity (the nonlinear counterpart to multicolinearlity in linear models) can cause catastrophically bad estimates of smooth functions.
Upvotes: 2