Reputation: 335
I am running a gam model based on a large dataset with many variables. My response variable is the level of "recruitment" by a herd every fall/autumn. This is calculated by the fawn:female ratio every fall/autumn over a 60 year period.
My problem is that there are many years and study sites where only between 1 - 10 females are recorded. This means that the robustness of the ratio is not trustworthy. For example if one female and one fawn is seen, it has a recruitment of 100%, but if they see one more female, that drops by 50%!
I need to tell the model that years/study sites with smaller sample sizes should be weighted less than those with larger sample sizes as these smaller sample sizes are no doubt affecting the results.
Above is a table of the females observed every year and a histogram of the same.
My model is as follows:
gamFIN <- gam(Fw.FratioFall
~ s(year)
+ s(percentage_woody_coverage)
+ s(kmRoads.km2)
+ s(WELLS_ACTIVEinsideD)
+ s(d3)
+ s(WT_DEER_springsurveys)
+ s(BadlandsCoyote.1000_mi)
+ s(Average_mintemp_winter, BadlandsCoyote.1000_mi)
+ s(BadlandsCoyote.1000_mi, WELLS_ACTIVEinsideD)
+ s(BadlandsCoyote.1000_mi, d3)
+ s(YEAR, bs = "re") + s(StudyArea, bs = "re"), method = "REML", select = T, data = mydata)
How might I tell the model to weight my response variable by the sample sizes they are based on.
Upvotes: 1
Views: 1186
Reputation: 263362
Do not model this as a ratio for your outcome. Instead model the fawn counts as your outcome and model the female counts via an offset()
term using logged values on the RHS of the formula. You should be offsetting with the log of the fawn count. So the formula would look like this:
Fawns
~ s(year)
+ all_those_smooth_terms
+ offset( lnFemale_counts)
The gam models have an implicit log link which is the reason for the logging of the Female counts.
Edit (Gavin's correct. The default for gam is not a linear link):
gamFIN <- gam(FawnFall ~ s(year) + s(percentage_woody_coverage) + s(kmRoads.km2) +
s(WELLS_ACTIVEinsideD) + s(d3) + s(WT_DEER_springsurveys) +
s(BadlandsCoyote.1000_mi) + s(Average_mintemp_winter, BadlandsCoyote.1000_mi) +
s(BadlandsCoyote.1000_mi, WELLS_ACTIVEinsideD) + s(BadlandsCoyote.1000_mi, d3) +
s(YEAR, bs = "re") + s(StudyArea, bs = "re") + offset(FemaleFall),
family="poisson", method = "REML", select = T, data = mydata)
Upvotes: 2