JEquihua
JEquihua

Reputation: 1227

Memory error using rfimpute from randomForest package in R

I want to try to fill in my missing values in a data set im currently working on. The data has 13300 observations and 9 features. I want to run a random forest so I tried using rfimpute to fill in these missing values. I get the following error: cannot allocate vector of size 678.4 Mb. I'm running this on a windows machine with 8 gbs of ram. This is the call that I do:

datos.imputados <- rfImpute(vo~P4.Plan.Esp+P11.Comprador+SegmentoDisipado+PersMcKinsey+Kids+IndefDulceSal+lugarcons+Compania,data=datos,ntrees=300,iter=6)

¿What is going on here? 670 mbs doesnt sound like a lot...

Upvotes: 2

Views: 1479

Answers (2)

aaron
aaron

Reputation: 6489

I had the same problem using rfImpute on a MacMini with 16GB of RAM with a hyperthreaded quad core. For your everyday data analysis problems there's not much that it can't handle. The problem is that rfImpute works by generating a proximity matrix. The proximity matrix is N x N, which for your application means that rfImpute creates a background object that has 13300^2 entries. In my case it was 93000^2.

One thing that you can do is split the data up into K different segments and apply rfImpute to each slice, manually recombining afterwards:

slices <- 8 
idx <- rep(1:slices, each = ceiling(nrow(X)/slices))
idx <- idx[1:nrow(X)]

imputedData <- do.call('rbind', lapply(1:slices, function(SLICE){
    print(SLICE)
    require(randomForest)
    rfImpute(X[idx == slice, ], Y[idx == slice])
  }))

You can parallelize this using parLapply as follows:

slices <- 8 
idx <- rep(1:slices, each = ceiling(nrow(X)/slices))
idx <- idx[1:nrow(X)]

cl <- makeCluster(8)
clusterExport(cl, c('idx', 'slices', 'X', 'Y'))
  imputedData <- do.call('rbind', parLapply(cl, 1:slices, function(SLICE){
    require(randomForest)
    rfImpute(X[idx == SLICE, ], Y[idx == SLICE])
  }))
stopCluster(cl)

Upvotes: 3

leo
leo

Reputation: 3749

I had the same problem. As described in the comments by Roland you need additional 700 MB of Memory which you might not have at this stage.

You might either try to free your memory or look at a less sophisticated method to impute. Like impute described here https://stackoverflow.com/a/13114887/55070.

Upvotes: 2

Related Questions