mgcv GAM plot and predict with tensor smooth by factor

Question

I am running GAMs to understand the spatio-temporal trends in size counts of a Californian mollusc. I have count data as a response to three-way interactions between space & time (lat,long,year) and north/east currents & time (uo,vo,year) each partitioned out by 3 size classes (small, med, large). Here's the gam:

count_te_model.xy.vo.I = gam(count ~ size_bin +
                                te(latitude, longitude, year, d=c(2,1), by=size_bin) +
                                te(vo, uo, year, d=c(2,1), by=size_bin) +
                                offset(log(plots_sampled)),
                              data=LG_count_plot_mpa_F, family=nb(link="log"), method="REML")


summary(count_te_model.xy.vo.I)

Family: Negative Binomial(2.271) 
Link function: log 

Formula:
count ~ size_bin + te(latitude, longitude, year, d = c(2, 1), 
    by = size_bin) + te(vo, uo, year, d = c(2, 1), by = size_bin) + 
    offset(log(plots_sampled))

Parametric coefficients:
              Estimate Std. Error z value Pr(>|z|)    
(Intercept)    2.60406    0.02163 120.411   <2e-16 ***
size_binmed    0.30197    0.03050   9.900   <2e-16 ***
size_binsmall  0.04658    0.03093   1.506    0.132    
---
Signif. codes:  0 ‘***’ 0.001 ‘**’ 0.01 ‘*’ 0.05 ‘.’ 0.1 ‘ ’ 1

Approximate significance of smooth terms:
                                            edf Ref.df Chi.sq p-value    
te(latitude,longitude,year):size_binlarge 44.64  51.05  630.2  <2e-16 ***
te(latitude,longitude,year):size_binmed   55.82  65.78  563.4  <2e-16 ***
te(latitude,longitude,year):size_binsmall 53.13  60.41  724.4  <2e-16 ***
te(vo,uo,year):size_binlarge              30.58  40.02  105.3  <2e-16 ***
te(vo,uo,year):size_binmed                37.54  49.24  135.8  <2e-16 ***
te(vo,uo,year):size_binsmall              53.13  67.03  266.2  <2e-16 ***
---
Signif. codes:  0 ‘***’ 0.001 ‘**’ 0.01 ‘*’ 0.05 ‘.’ 0.1 ‘ ’ 1

R-sq.(adj) =  0.429   Deviance explained = 54.4%
-REML =  15736  Scale est. = 1         n = 2944

I now want to map the change in abundance over time per size class, and was wondering if anyone knows how to best do this with 3-way interactions by factor?

I have tried "plot.gam" as the following:

plot(count_te_model.xy.vo.I, all.terms=TRUE, too.far=0.05)

And it produces these plots:

large individual plots

medium individual plots

small individual plots

I've also created an xy grid to predict the gam output on, then map. I am using the predict function as so:

head(predict_count_coast_L)
# A tibble: 6 x 8
  longitude latitude  year size_bin plots_sampled    uo    vo model_fit
                               
1     -124.     41.7  1995 large                5     1     1     0.162
2     -124.     41.7  1995 large                5     1     1     0.161
3     -124.     41.7  1995 large                5     1     1     0.160
4     -124.     41.7  1995 large                5     1     1     0.159
5     -124.     41.7  1995 large                5     1     1     0.158
6     -124.     41.7  1995 large                5     1     1     0.157

predict_count_coast_L$model_fit = predict(count_te_model.xy.vo.I,
                                          predict_count_coast_L,type = "link", 
                                          exclude = "te(vo, uo, year, d=c(2,1), by=size_bin)")

ggplot(aes(longitude, latitude, fill= model_fit),
       data=predict_count_coast_L)+
  geom_tile()+
  facet_wrap(~year,nrow=3)+
  scale_fill_viridis("count")+
  ggtitle("large individuals")+
  theme_bw(10)

This produces maps which look like they have quite different patterns than the gam.plot. (*Do note that the years increase from top to bottom now!)

large individual plots

medium individual plots

small individual plots

I'm also still trying to understand the output values with the "link" predict type... the vignette says it 'produces predictions on the scale of the additive predictors' but I am struggling to understand what this actually means. Are these log link values?

I've also tried the above with type as "response" instead of "link", and it gives me even more different patterns (shown just for small individuals here):

small individual plots

If anyone has an idea why these give different outputs, and if there is a preferred way of predicting/plotting GAMs such as these, that would be much appreciated!!

Update #1

Trying to cross validate models

I am comparing different types of hierarchical models as described in this paper:[https://peerj.com/articles/6876/?td=tw] and want to compare them by conducting a cross validation, using even years as testing and odd as training. I am not sure how to back transform the link values onto the original scale of size counts. All models have the family 'nb(link="log")'. I have tried the 'linkinv' function below, but not sure if this is correct, or if I can just do 'exp()'... Any advice would be super helpful!

LG_train <- subset(LG_count_plot_mpa_F, year%%2==0)
LG_test  <- subset(LG_count_plot_mpa_F, year%%2==1) 

LG_predict_m = mutate(
  LG_count_plot_mpa_F,
  lg1_model = as.vector(predict(count_te_model.xy.vo.I, LG_count_plot_mpa_F,type = "link")),
  lg2_model = as.vector(predict(count_te_model.xy.vo.G, LG_count_plot_mpa_F,type = "link")),
  lg3_model = as.vector(predict(count_te_model.xy.vo.GI,LG_count_plot_mpa_F,type = "link")),
  data_type = factor(ifelse(year%%2==0, "training","testing"),
                     levels= c("training","testing"))
)

ilink <- family(count_te_model.xy.vo.I)$linkinv

LG_predict_m_2 = mutate(
  LG_count_plot_mpa_F,
  lg1_link = as.vector(ilink(LG_predict_m$lg1_model)),
  lg2_link = as.vector(ilink(LG_predict_m$lg2_model)),
  lg3_link = as.vector(ilink(LG_predict_m$lg3_model)),
  data_type = factor(ifelse(year%%2==0, "training","testing"),
                     levels= c("training","testing"))
)

LG_predict = gather(LG_predict_m_2,key= model, value= count_est,
                    lg1_link:lg3_link )%>%
  mutate(count_est = as.numeric(count_est))

forecast_accuracy_m = LG_predict %>% 
  group_by(model)%>%
  filter(data_type=="testing")%>%
  summarize(out_of_sample_r2 = round(cor(log(count_est),log(count))^2,2))
print(forecast_accuracy_m)

mgcv GAM plot and predict with tensor smooth by factor

Answers (1)

Related Questions