Reputation: 1756
I have a data about 19 columns and more than 10 million rows. Now I want to run negative binomial regression.
Since the memory is the bottleneck, I planed to use ff
package to deal with the issue. But it turned out that the function glm.nb
in MASS
package cannot be used in this case. And there's a ffbase
package, which have some enhanced functions, but without glm.nb.
Alsobigmemory
and biganalytics
packages have such problems.
I don't know whether my understanding is correct. Or there's indeed a feasible way to incorporate ff
and MASS
. So how to proceed in the next?
PS, I use windows...which seems to be a curse dealing with such large data..
Any link, comments, or tips are appreciated!
Upvotes: 1
Views: 557
Reputation: 94277
Take a random sample of your data points. Do the analysis. Repeat. Estimate the variance due to this monte-carlo process. If your resulting parameters are still significantly non-zero then stop.
Upvotes: 4