One level of GAM factor not wiggly

Question

I am fitting data on territories over time (time series) in a gam using mgcv for male and female birds (n=10, 4 Female, 6 Male) identified by BandNumber. Prelim plots (in ggplot2) indicate that both follow a similar annual pattern of increasing foraging terrotory in the middle of the year and decreasing at the start/end of the year (see below).

When modelling using gam in mgcv I am finding it hard to find a model that predicts these outcomes. If I run a model that excludes the influence of sex on foraging area size kde, I get a global plot that shows this same trend (see below)

gam1<-gam(kde ~  s(TIME, bs='cr') + s(BandNumber, bs = 're'),
             data = dd,
             family=gaussian(link='log'),
             method = "REML")

draw(gam1, residuals = T)

But, fitting for the interaction of sex and time predicts only males to show this trend and females with a peak roughly in the second year. Note time is as numerical values. (see below).

    gam2<-gam(kde ~ s(TIME, by=SEX, bs='cr') + 
                 s(BandNumber, bs = 're'), 
                 data = dd,
                 family=gaussian(link='log'),
                 method = "REML")

draw(gam2, residuals = T)

I would like to get the female plot over time more "wiggly" to better fit the data but I haven't been able to find a way. I have tried adjusting k and m values. I have also tried using all different smoothers available and using fx=TRUE with increased k but haven't managed to get it to work. I have tried including sex without a spline fit as well in the equation but that didn't change much either. Does anyone have ideas on how to improve the female level of the factor to match the first preliminary plot?

EDIT

After comments from @Gavin Simpson I changed my model specs slightly. But looking at the plot below, I still feel the female level of the sex factor is not well fitted to the data which show a clear seasonal increase and decrease. See below smooth predictions of the model with original data points plotted.

pal<-wesanderson::wes_palette("Royal2")


gam3<-gam(kde ~  SEX + s(TIME, by=SEX, bs='cr') + s(BandNumber, bs = c("re")) + BandNumber,
          data = dd,
          family=gaussian(),
          method = "REML")

prels = with(dd,
             list(TIME=seq(min(TIME), max(TIME), len=6000),
                  BandNumber=sample(BandNumber, size=6000, replace=T),
                  SEX=sample(SEX, size=6000, replace=T)))

preds<-predict.gam(gam3, newdata = prels, 
                   type = 'response', 
                   exclude = c("s(BandNumber)"),
                   se.fit=TRUE)

preDF<-prels%>%
  data.frame()%>%
  cbind(.,preds)%>%
  filter(SEX=='M' & TIME<=19.205 | SEX=='F' & TIME<=19.448)


ggplot() +
  geom_ribbon(data=preDF, aes(x=TIME, group=SEX, ymin=fit-se.fit, ymax=fit+se.fit), fill = "grey70", alpha=0.3) +
  geom_line(data=preDF, aes(y=fit, x=TIME, group=SEX, col=SEX), size=1)+
  geom_point(data=dd, aes(x=MONTH, y=kde, col=SEX)) +
  scale_color_manual(values = c(pal[c(3,5)])) +
  labs() +
  theme_classic()+
  labs(y="95% KDE (km^2)", x="Date")

As Gavin pointed out it could be that after accounting for individual variation the seasonal trend is no longer present at the female level. However when using a cyclic spline I am able to show the seasonal trend clearly - see below.

gam4<-gam(kde ~  SEX + s(TIME, by=SEX, bs='cc') + s(BandNumber, bs = c("re")) + BandNumber,
      
      data = dd,
      family=gaussian(),
      method = "REML")

prels = with(dd,
             list(MONTH=seq(min(MONTH), max(MONTH), len=6000),
                  BandNumber=sample(BandNumber, size=6000, replace=T),
                  SEX=sample(SEX, size=6000, replace=T)))

preds<-predict.gam(gam4, newdata = prels, 
                   type = 'response', 
                   exclude = c("s(BandNumber)"),
                   se.fit=TRUE)

preDF<-prels%>%
  data.frame()%>%
  cbind(.,preds)


ggplot() +
  geom_ribbon(data=preDF, aes(x=MONTH, group=SEX, ymin=fit-se.fit, ymax=fit+se.fit), fill = "grey70", alpha=0.3) +
  geom_line(data=preDF, aes(y=fit, x=TIME, group=SEX, col=SEX), size=1)+
  geom_point(data=dd, aes(x=MONTH, y=kde, col=SEX)) +
  scale_color_manual(values = c(pal[c(3,5)])) +
  labs() +
  theme_classic()+
  labs(y="95% KDE (km^2)", x="Date")

Interestingly, nesting BandNumber within YEAR for the reandom error smooth I get the expected predictions for seasonal fluctuations over time for both levels (not all BandNumbers have data in all years).

gam5<-gam(kde ~  SEX + s(TIME, by=SEX, bs='cr')  + s(YEAR, BandNumber, bs = c("re")),
          
          data = dd,
          family=gaussian(),
          method = "REML")
prels = with(monthlyKDE,
             list(TIME=seq(min(TIME), max(TIME), len=6000),
                  BandNumber=sample(BandNumber, size=6000, replace=T),
                  YEAR=sample(YEAR, size=6000, replace=T),
                  SEX=sample(SEX, size=6000, replace=T)))



preds<-predict.gam(gam5, newdata = prels, 
                   type = 'response', 
                   exclude = c("s(YEAR,BandNumber)"),
                   se.fit=TRUE)


preDF<-prels%>%
  data.frame()%>%
  cbind(.,preds)%>%
  filter(SEX=='M' & TIME<=19.205 | SEX=='F' & TIME<=19.448)


ggplot() +
  geom_ribbon(data=preDF, aes(x=TIME, group=SEX, ymin=fit-se.fit, ymax=fit+se.fit), fill = "grey70", alpha=0.3) +
  geom_line(data=preDF, aes(y=fit, x=TIME, group=SEX, col=SEX), size=1)+
  geom_point(data=dd, aes(x=TIME, y=kde, col=SEX)) +
  scale_color_manual(values = c(pal[c(3,5)])) +
  labs() +
  theme_classic()+
  scale_x_continuous(breaks = 1:12)+
  labs(y="95% KDE (km^2)", x="Month", col="Sex")

One level of GAM factor not wiggly

Answers (1)

Related Questions