Nebulloyd
Nebulloyd

Reputation: 264

One level of GAM factor not wiggly

I am fitting data on territories over time (time series) in a gam using mgcv for male and female birds (n=10, 4 Female, 6 Male) identified by BandNumber. Prelim plots (in ggplot2) indicate that both follow a similar annual pattern of increasing foraging terrotory in the middle of the year and decreasing at the start/end of the year (see below).

![enter image description here

When modelling using gam in mgcv I am finding it hard to find a model that predicts these outcomes. If I run a model that excludes the influence of sex on foraging area size kde, I get a global plot that shows this same trend (see below)

gam1<-gam(kde ~  s(TIME, bs='cr') + s(BandNumber, bs = 're'),
             data = dd,
             family=gaussian(link='log'),
             method = "REML")

draw(gam1, residuals = T)

enter image description here

But, fitting for the interaction of sex and time predicts only males to show this trend and females with a peak roughly in the second year. Note time is as numerical values. (see below).

    gam2<-gam(kde ~ s(TIME, by=SEX, bs='cr') + 
                 s(BandNumber, bs = 're'), 
                 data = dd,
                 family=gaussian(link='log'),
                 method = "REML")

draw(gam2, residuals = T)

enter image description here

I would like to get the female plot over time more "wiggly" to better fit the data but I haven't been able to find a way. I have tried adjusting k and m values. I have also tried using all different smoothers available and using fx=TRUE with increased k but haven't managed to get it to work. I have tried including sex without a spline fit as well in the equation but that didn't change much either. Does anyone have ideas on how to improve the female level of the factor to match the first preliminary plot?

EDIT

After comments from @Gavin Simpson I changed my model specs slightly. But looking at the plot below, I still feel the female level of the sex factor is not well fitted to the data which show a clear seasonal increase and decrease. See below smooth predictions of the model with original data points plotted.

pal<-wesanderson::wes_palette("Royal2")


gam3<-gam(kde ~  SEX + s(TIME, by=SEX, bs='cr') + s(BandNumber, bs = c("re")) + BandNumber,
          data = dd,
          family=gaussian(),
          method = "REML")

prels = with(dd,
             list(TIME=seq(min(TIME), max(TIME), len=6000),
                  BandNumber=sample(BandNumber, size=6000, replace=T),
                  SEX=sample(SEX, size=6000, replace=T)))

preds<-predict.gam(gam3, newdata = prels, 
                   type = 'response', 
                   exclude = c("s(BandNumber)"),
                   se.fit=TRUE)

preDF<-prels%>%
  data.frame()%>%
  cbind(.,preds)%>%
  filter(SEX=='M' & TIME<=19.205 | SEX=='F' & TIME<=19.448)


ggplot() +
  geom_ribbon(data=preDF, aes(x=TIME, group=SEX, ymin=fit-se.fit, ymax=fit+se.fit), fill = "grey70", alpha=0.3) +
  geom_line(data=preDF, aes(y=fit, x=TIME, group=SEX, col=SEX), size=1)+
  geom_point(data=dd, aes(x=MONTH, y=kde, col=SEX)) +
  scale_color_manual(values = c(pal[c(3,5)])) +
  labs() +
  theme_classic()+
  labs(y="95% KDE (km^2)", x="Date")

enter image description here

As Gavin pointed out it could be that after accounting for individual variation the seasonal trend is no longer present at the female level. However when using a cyclic spline I am able to show the seasonal trend clearly - see below.

gam4<-gam(kde ~  SEX + s(TIME, by=SEX, bs='cc') + s(BandNumber, bs = c("re")) + BandNumber,
      
      data = dd,
      family=gaussian(),
      method = "REML")

prels = with(dd,
             list(MONTH=seq(min(MONTH), max(MONTH), len=6000),
                  BandNumber=sample(BandNumber, size=6000, replace=T),
                  SEX=sample(SEX, size=6000, replace=T)))

preds<-predict.gam(gam4, newdata = prels, 
                   type = 'response', 
                   exclude = c("s(BandNumber)"),
                   se.fit=TRUE)

preDF<-prels%>%
  data.frame()%>%
  cbind(.,preds)


ggplot() +
  geom_ribbon(data=preDF, aes(x=MONTH, group=SEX, ymin=fit-se.fit, ymax=fit+se.fit), fill = "grey70", alpha=0.3) +
  geom_line(data=preDF, aes(y=fit, x=TIME, group=SEX, col=SEX), size=1)+
  geom_point(data=dd, aes(x=MONTH, y=kde, col=SEX)) +
  scale_color_manual(values = c(pal[c(3,5)])) +
  labs() +
  theme_classic()+
  labs(y="95% KDE (km^2)", x="Date")

![enter image description here

Interestingly, nesting BandNumber within YEAR for the reandom error smooth I get the expected predictions for seasonal fluctuations over time for both levels (not all BandNumbers have data in all years).

gam5<-gam(kde ~  SEX + s(TIME, by=SEX, bs='cr')  + s(YEAR, BandNumber, bs = c("re")),
          
          data = dd,
          family=gaussian(),
          method = "REML")
prels = with(monthlyKDE,
             list(TIME=seq(min(TIME), max(TIME), len=6000),
                  BandNumber=sample(BandNumber, size=6000, replace=T),
                  YEAR=sample(YEAR, size=6000, replace=T),
                  SEX=sample(SEX, size=6000, replace=T)))



preds<-predict.gam(gam5, newdata = prels, 
                   type = 'response', 
                   exclude = c("s(YEAR,BandNumber)"),
                   se.fit=TRUE)


preDF<-prels%>%
  data.frame()%>%
  cbind(.,preds)%>%
  filter(SEX=='M' & TIME<=19.205 | SEX=='F' & TIME<=19.448)


ggplot() +
  geom_ribbon(data=preDF, aes(x=TIME, group=SEX, ymin=fit-se.fit, ymax=fit+se.fit), fill = "grey70", alpha=0.3) +
  geom_line(data=preDF, aes(y=fit, x=TIME, group=SEX, col=SEX), size=1)+
  geom_point(data=dd, aes(x=TIME, y=kde, col=SEX)) +
  scale_color_manual(values = c(pal[c(3,5)])) +
  labs() +
  theme_classic()+
  scale_x_continuous(breaks = 1:12)+
  labs(y="95% KDE (km^2)", x="Month", col="Sex")

enter image description here

Upvotes: 0

Views: 181

Answers (1)

Gavin Simpson
Gavin Simpson

Reputation: 174778

You need to realize that the female smooth in the ggplot is being fitted without considering the fact that you data come from a few birds and are not a set of independent observations.

As such there should be no reason at all for the female smooth in a model that does account for the clustering by bird in the data to match the one in the ggplot, and trying to force it to be wiggly would require overfitting.

ANother issue is that your plot is showing separate GAMs fitted with family = gaussian(link = "identity"), but your model is being fitted on a log-scale linear predictor because you added the log link to the model specification. I don't understand why you think the response (which seems to be a positive or at least non-negative real valued random variable) is conditionally distributed Gaussian?

Your model is also wrongly specified; with factor by smooths, you must include the group means; in this situation you need a parameter if sex term in your formula:

gam2 <- gam(kde ~ SEX + s(TIME, by=SEX, bs='cr') + 
              s(BandNumber, bs = 're'), 
            data = dd,
            family = gaussian(link='log'),
            method = "REML")

Finally, something looks odd in your models; there seems to be very little residual variation suggesting your model fits the data perfectly, which seems at odds with the first plot you showed. Perhaps this is the result of the of the bird random effect?

Upvotes: 1

Related Questions