bdeonovic
bdeonovic

Reputation: 4220

R: parallel process several rows together using doParallel

I would like to do some parallel processing on a large data frame in R using the doParallel package. Lets call the data frame mydata. I want to iterate over the data frame by rows, so something like

foreach(x=iter(mydata, by='row')) %dopar%{
    ... do stuff ...
}

However, thats not quite right because in each loop I need to have access to several rows. Lets say the variable idx contains the information of which rows need to be processed together. Lets say idx is a matrix that looks like

1  2  3
10 12 14
4  7  9
...

where each row indicates the rows of mydata that need to be processed together. How can I do this using the doParallel package?

EDIT: I see that I can send "blocks" of the data.frame using iblkcol is there a way to send non-consecutive blocks of my choosing?

EDIT: I ended up using a custom iteraor:

> data <- data.frame(A=sample(letters,10),B=rnorm(10))
> data
   A          B
1  z  0.5105797
2  h  1.2559502
3  a  0.9697254
4  n -1.4189076
5  e -0.5800640
6  b  0.2907486
7  q -2.4414012
8  d  1.8146928
9  v  0.2510003
10 x -0.2011185
> idx <- list(c(1,2),c(4,5),c(3,6,7),c(8,9,10))
> 
> library(iterators)
> 
> ialn <- function( x, idx){
+   it <- iter(idx)
+   nextEl <- function(){
+     n <- nextElem(it)
+     x[n,]
+   }
+   obj <- list(nextElem=nextEl)
+   class(obj)<- c('ialn','abstractiter','iter')
+   obj
+ }
> 
> 
> it <- ialn(data,idx)
> nextElem(it)
  A         B
1 z 0.5105797
2 h 1.2559502
> nextElem(it)
  A         B
4 n -1.418908
5 e -0.580064

Upvotes: 2

Views: 550

Answers (1)

Thell
Thell

Reputation: 5958

Perhaps splitting mydata into a list based on

apply(idx,1,function(idx) list(mydata[idx,]) )

and then sending that list through the foreach?

Either that or a custom iterator that gets the data based on the row indexes.

Upvotes: 1

Related Questions