user3088823
user3088823

Reputation: 81

increasing memory in R

I'm working with a large data set (41,000 observations and 22 predictor variables) and trying to fit a Random Forest model using this code:

model <- randomForest(as.factor(data$usvsa) ~ ., ntree=1000, importance=TRUE, + proximity=TRUE, data=data).

I am running into the following error:

Error: cannot allocate vector of size 12.7 Gb
In addition: Warning messages:
1: In matrix(0, n, n) :
  Reached total allocation of 6019Mb: see help(memory.size)
2: In matrix(0, n, n) :
  Reached total allocation of 6019Mb: see help(memory.size)
3: In matrix(0, n, n) :
  Reached total allocation of 6019Mb: see help(memory.size)
4: In matrix(0, n, n) :
  Reached total allocation of 6019Mb: see help(memory.size)

I have done some reading in the R help on memory limits and on this site and am thinking that I need to buy 12+ GB of RAM since my memoryLimit is already set to about 6GB of RAM (my computer only has 6 GB of RAM). But first I wanted to double check that this is the only solution. I am running a windows 7 with a 64 bit processor and 6GB of RAM. Here is the R sessionInfo:

sessionInfo()
R version 2.15.3 (2013-03-01)
Platform: x86_64-w64-mingw32/x64 (64-bit)

locale:
[1] LC_COLLATE=English_United States.1252  LC_CTYPE=English_United States.1252    LC_MONETARY=English_United States.1252
[4] LC_NUMERIC=C                           LC_TIME=English_United States.1252    

attached base packages:
[1] stats     graphics  grDevices utils     datasets  methods   base     

other attached packages:
[1] randomForest_4.6-7

loaded via a namespace (and not attached):
[1] tools_2.15.3

Any tips?

Upvotes: 1

Views: 2539

Answers (3)

Maxim.K
Maxim.K

Reputation: 4180

The solution to your problem is actually pretty simple, and you don't have to sacrifice the quality of your analysis or invest into local RAM (which still may turn out to be insufficient). Simply make use of cloud computing services, such as Amazon's AWS or whichever provider you choose.

Basically, you rent a virtual machine, which has dynamic RAM. It can expand as you need, I've been using a 64Gb RAM server at one point. Choose Linux, install R and libraries, upload your data and scripts, run your analysis. If it completes quickly, the whole procedure will not cost much (most likely under $10). Good luck!

Upvotes: 1

Dirk is no longer here
Dirk is no longer here

Reputation: 368201

Quoting from the wonderful paper "Big Data: New Tricks for Econometrics" by Hal Varian:

If the extracted data is still inconveniently large, it is often possible to select a subsample for statistical analysis. At Google, for example, I have found that random samples on the order of 0.1 percent work for analysis of economic data.

So how about if you don't use all 41k rows and 22 predictors?

Upvotes: 2

Scott Ritchie
Scott Ritchie

Reputation: 10543

Yes, you simply need to buy more RAM. By default R will use all the memory available to it (at least on osx and linux)

Upvotes: 1

Related Questions