Reputation: 33
I am trying to fit an additive mixed model using bam (mgcv library). My dataset has 10^6 observations from a longitudinal study on growth in 2.10^5 children nested in 300 health centers. I am looking for the slope for each center. The model is
bam(haz ~ s(month, bs = "cc", k = 12)+ sex+ s(age)+ center+ year+ year*center+s(child, bs="re"), data)
Whenever, when I try to fit the model the following error message appears:
Error: cannot allocate vector of size 99.6 Gb
In addition: Warning message:
In matrix(by, n, q) : data length exceeds size of matrix
I am working on a cluster with 500 Gb de RAM.
Thank you for any help
Upvotes: 3
Views: 1823
Reputation: 226751
To diagnose more precisely where the problem is, try fitting your model with various terms left out. There are several terms in the model that could blow up on you:
center
will blow up to 300 columns * 10^6 rows; depending on whether year
is numeric or a factor, the year*center
term could blow up to 600 columns or (nyears*300) columnsbam
uses sparse matrices for s(.,bs="re")
terms; if not, you'll be in big trouble (2*10^5 columns * 10^6 rows)Order of magnitude, a vector of 10^6 numeric values (one column of your model matrix) takes 7.6 Mb, so 500 GB / 7.6 MB would be approximately 65,000 columns ...
Just taking a guess here, but I would try out the gamm4
package. It's not specifically geared for low-memory use, but:
‘gamm4’ is most useful when the random effects are not i.i.d., or when there are large numbers of random coeffecients [sic] (more than several hundred), each applying to only a small proportion of the response data.
I would also make most of the terms into random effects:
gamm4::gamm4(haz ~ s(month, bs = "cc", k = 12)+ sex+ s(age)+
(1|center)+ (1|year)+ (1|year:center)+(1|child), data)
or, if there are not very many years in the data set, treat year as a fixed effect:
gamm4::gamm4(haz ~ s(month, bs = "cc", k = 12)+ sex+ s(age)+
year + (1|center)+ (1|year:center)+(1|child), data)
If there are a small number of years then (year|center)
might make sense, to assess among-center variation and covariation among years ... if there are many years, consider making it a smooth term instead ...
Upvotes: 7