Reputation: 264
I am fitting data on territories over time (time series) in a gam using mgcv for male and female birds (n=10, 4 Female, 6 Male) identified by BandNumber
. Prelim plots (in ggplot2
) indicate that both follow a similar annual pattern of increasing foraging terrotory in the middle of the year and decreasing at the start/end of the year (see below).
When modelling using gam
in mgcv
I am finding it hard to find a model that predicts these outcomes. If I run a model that excludes the influence of sex
on foraging area size kde
, I get a global plot that shows this same trend (see below)
gam1<-gam(kde ~ s(TIME, bs='cr') + s(BandNumber, bs = 're'),
data = dd,
family=gaussian(link='log'),
method = "REML")
draw(gam1, residuals = T)
But, fitting for the interaction of sex
and time
predicts only males to show this trend and females with a peak roughly in the second year. Note time
is as numerical values. (see below).
gam2<-gam(kde ~ s(TIME, by=SEX, bs='cr') +
s(BandNumber, bs = 're'),
data = dd,
family=gaussian(link='log'),
method = "REML")
draw(gam2, residuals = T)
I would like to get the female plot over time more "wiggly" to better fit the data but I haven't been able to find a way. I have tried adjusting k
and m
values. I have also tried using all different smoothers available and using fx=TRUE
with increased k
but haven't managed to get it to work. I have tried including sex
without a spline fit as well in the equation but that didn't change much either. Does anyone have ideas on how to improve the female
level of the factor to match the first preliminary plot?
EDIT
After comments from @Gavin Simpson I changed my model specs slightly. But looking at the plot below, I still feel the female level of the sex
factor is not well fitted to the data which show a clear seasonal increase and decrease. See below smooth predictions of the model with original data points plotted.
pal<-wesanderson::wes_palette("Royal2")
gam3<-gam(kde ~ SEX + s(TIME, by=SEX, bs='cr') + s(BandNumber, bs = c("re")) + BandNumber,
data = dd,
family=gaussian(),
method = "REML")
prels = with(dd,
list(TIME=seq(min(TIME), max(TIME), len=6000),
BandNumber=sample(BandNumber, size=6000, replace=T),
SEX=sample(SEX, size=6000, replace=T)))
preds<-predict.gam(gam3, newdata = prels,
type = 'response',
exclude = c("s(BandNumber)"),
se.fit=TRUE)
preDF<-prels%>%
data.frame()%>%
cbind(.,preds)%>%
filter(SEX=='M' & TIME<=19.205 | SEX=='F' & TIME<=19.448)
ggplot() +
geom_ribbon(data=preDF, aes(x=TIME, group=SEX, ymin=fit-se.fit, ymax=fit+se.fit), fill = "grey70", alpha=0.3) +
geom_line(data=preDF, aes(y=fit, x=TIME, group=SEX, col=SEX), size=1)+
geom_point(data=dd, aes(x=MONTH, y=kde, col=SEX)) +
scale_color_manual(values = c(pal[c(3,5)])) +
labs() +
theme_classic()+
labs(y="95% KDE (km^2)", x="Date")
As Gavin pointed out it could be that after accounting for individual variation the seasonal trend is no longer present at the female
level. However when using a cyclic spline I am able to show the seasonal trend clearly - see below.
gam4<-gam(kde ~ SEX + s(TIME, by=SEX, bs='cc') + s(BandNumber, bs = c("re")) + BandNumber,
data = dd,
family=gaussian(),
method = "REML")
prels = with(dd,
list(MONTH=seq(min(MONTH), max(MONTH), len=6000),
BandNumber=sample(BandNumber, size=6000, replace=T),
SEX=sample(SEX, size=6000, replace=T)))
preds<-predict.gam(gam4, newdata = prels,
type = 'response',
exclude = c("s(BandNumber)"),
se.fit=TRUE)
preDF<-prels%>%
data.frame()%>%
cbind(.,preds)
ggplot() +
geom_ribbon(data=preDF, aes(x=MONTH, group=SEX, ymin=fit-se.fit, ymax=fit+se.fit), fill = "grey70", alpha=0.3) +
geom_line(data=preDF, aes(y=fit, x=TIME, group=SEX, col=SEX), size=1)+
geom_point(data=dd, aes(x=MONTH, y=kde, col=SEX)) +
scale_color_manual(values = c(pal[c(3,5)])) +
labs() +
theme_classic()+
labs(y="95% KDE (km^2)", x="Date")
Interestingly, nesting BandNumber
within YEAR
for the reandom error smooth I get the expected predictions for seasonal fluctuations over time for both levels (not all BandNumbers have data in all years).
gam5<-gam(kde ~ SEX + s(TIME, by=SEX, bs='cr') + s(YEAR, BandNumber, bs = c("re")),
data = dd,
family=gaussian(),
method = "REML")
prels = with(monthlyKDE,
list(TIME=seq(min(TIME), max(TIME), len=6000),
BandNumber=sample(BandNumber, size=6000, replace=T),
YEAR=sample(YEAR, size=6000, replace=T),
SEX=sample(SEX, size=6000, replace=T)))
preds<-predict.gam(gam5, newdata = prels,
type = 'response',
exclude = c("s(YEAR,BandNumber)"),
se.fit=TRUE)
preDF<-prels%>%
data.frame()%>%
cbind(.,preds)%>%
filter(SEX=='M' & TIME<=19.205 | SEX=='F' & TIME<=19.448)
ggplot() +
geom_ribbon(data=preDF, aes(x=TIME, group=SEX, ymin=fit-se.fit, ymax=fit+se.fit), fill = "grey70", alpha=0.3) +
geom_line(data=preDF, aes(y=fit, x=TIME, group=SEX, col=SEX), size=1)+
geom_point(data=dd, aes(x=TIME, y=kde, col=SEX)) +
scale_color_manual(values = c(pal[c(3,5)])) +
labs() +
theme_classic()+
scale_x_continuous(breaks = 1:12)+
labs(y="95% KDE (km^2)", x="Month", col="Sex")
Upvotes: 0
Views: 181
Reputation: 174778
You need to realize that the female smooth in the ggplot is being fitted without considering the fact that you data come from a few birds and are not a set of independent observations.
As such there should be no reason at all for the female smooth in a model that does account for the clustering by bird in the data to match the one in the ggplot, and trying to force it to be wiggly would require overfitting.
ANother issue is that your plot is showing separate GAMs fitted with family = gaussian(link = "identity")
, but your model is being fitted on a log-scale linear predictor because you added the log link to the model specification. I don't understand why you think the response (which seems to be a positive or at least non-negative real valued random variable) is conditionally distributed Gaussian?
Your model is also wrongly specified; with factor by
smooths, you must include the group means; in this situation you need a parameter if sex
term in your formula:
gam2 <- gam(kde ~ SEX + s(TIME, by=SEX, bs='cr') +
s(BandNumber, bs = 're'),
data = dd,
family = gaussian(link='log'),
method = "REML")
Finally, something looks odd in your models; there seems to be very little residual variation suggesting your model fits the data perfectly, which seems at odds with the first plot you showed. Perhaps this is the result of the of the bird random effect?
Upvotes: 1