alex
alex

Reputation: 227

Issue with randomForest & long vectors

I am running random forest on a data set with 8 numeric columns (the predictors), and 1 factor (the outcome). There are 1.2M rows in the dataset. When I do:

randomForest(outcome.f ~ a + b + c + d + e + f + g + h,data=mdata)), I get an error:

"Error in randomForest.default(m, y, ...) : 
 long vectors (argument 26) are not supported in .Fortran"

Is there any way to prevent this? I don't understand why the package is (apparently) trying to allocate a vector of length 2^31-1. I'm using Mac OS X 10.9.2, with an Intel Core i7 (in case the architecture matters).

Session info

R version 3.1.0 (2014-04-10)
Platform: x86_64-apple-darwin13.1.0 (64-bit)

locale:
[1] en_US.UTF-8/en_US.UTF-8/en_US.UTF-8/C/en_US.UTF-8/en_US.UTF-8

attached base packages:
[1] stats     graphics  grDevices utils     datasets  methods   base     

other attached packages:
[1] randomForest_4.6-7

loaded via a namespace (and not attached):
[1] tools_3.1.0

Upvotes: 14

Views: 8617

Answers (5)

Mario Becerra
Mario Becerra

Reputation: 534

I've had this issue before, and it was solved by using proximity = FALSE. This way the proximity matrix is not computed and R is able to finish the process

Upvotes: 3

CapnShanty
CapnShanty

Reputation: 559

I just had this error pop up because my "y" dataset was actually NULL, so be mindful of that and check and make sure your y vector isn't empty.

Upvotes: 0

Erick Stone
Erick Stone

Reputation: 809

The connection that I think needs to be made is that if you use the 64 bit version of R, having a training set or a tree size too large triggers some c code which is only compatible with the 32 bit version. So, reduce the tree size and the training size to compensate.

Upvotes: 0

gcamargo
gcamargo

Reputation: 3961

You can also reduce the number of trees (ntree).

Upvotes: 2

dhany1024
dhany1024

Reputation: 133

Never run randomforest with too many rows on the training set.

rf1 <- randomForest(Outcome ~ ., train[1:600000,], ntree=500, norm.votes=FALSE, do.trace=10,importance=TRUE)
rf2 <- randomForest(Outcome ~ ., train[600001:1200000,], ntree=500, norm.votes=FALSE, do.trace=10,importance=TRUE)
rf.combined <- combine(rf1,rf2)

If you still get error, try to reduce the size of the training set (e.g. 500000, or 100000), divide into rf1, rf2, and rf3, then combine them. Hope it helps.

Upvotes: 7

Related Questions