Reputation: 51
This is a very simple example.
df = c("already ","miss you","haters","she's cool")
df = data.frame(df)
library(doParallel)
cl = makeCluster(4)
registerDoParallel(cl)
foreach(i = df[1:4,1], .combine = rbind, .packages='tm') %dopar% classification(i)
stopCluster(cl)
In real case I have dataframe with n=400000 rows. I don't know how to send nrow/ncluster data for each cluster in one step, i = ?
I tried with isplitRows from library(itertools) without success.
Upvotes: 5
Views: 4192
Reputation: 10350
You should try to work with indices to create subsets of your data.
foreach(i = nrow(df), .combine = rbind, .packages='tm') %dopar% {
tmp <- df[i, ]
classification(tmp)
}
This will take a new row of the data.frame
each iteration.
Furthermore, you should notice that the result of a foreach loop will be written to a new variable. Thus, you should assign it like this:
res <- foreach(i = 1:10, .combine = c, ....) %dopar% {
# things you want to do
x <- someFancyFunction()
# the last value will be returned and combined by the .combine function
x
}
Upvotes: 8
Reputation: 51
My solution after your comments:
n = 8 #number of cluster
library(foreach)
library(doParallel)
cl = makeCluster(n)
registerDoParallel(cl)
z = nrow(df)
y = floor(z/n)
x = nrow(df)%%n
ris = foreach(i = split(df[1:(z-x),],rep(1:n,each=y)), .combine = rbind, .packages='tm') %dopar% someFancyFunction(i)
stopCluster(cl)
#sequential
if (x !=0 )
ris = rbind(ris,someFancyFunction(df[(z-x+1):z,1]))
Note: I used the sequential esecution at the end, because if "x" is not zero, the function split put the rest of rows (z-(z-x)) in the first cluster, and change the order of the result.
Upvotes: 0
Reputation: 2535
Try using a combination of split
and mclapply
as proposed in Aproach 1 here: https://www.r-bloggers.com/trying-to-reduce-the-memory-overhead-when-using-mclapply/
split
lets you split data into groups defined by a factor, or you can just use 1:nrow(df)
if you want to do the operation on each row seperately.
Upvotes: 0