Is mclapply guaranteed to return its results in order?

I'm working with mclapply from the multicore package (on Ubuntu), and I'm writing a function that required that the results of mclapply(x, f) are returned in order (that is, f(x[1]), f(x[2]), ...., f(x[n])).

# multicore doesn't work on Windows

require(multicore)
unlist(mclapply(
    1:10,
    function(x){
        Sys.sleep(sample(1:5, size = 1))
        identity(x)}, mc.cores = 2))

[1] 1 2 3 4 5 6 7 8 9 10

The above code seems to imply that mclapply returns results in the same order as lapply.

However, if this assumption is wrong I'll have to spend a long time refactoring my code, so I'm hoping to get assurance from someone more familiar with this package/parallel computing that this assumption is correct.

Is it safe to assume that mclapply always returns its results in order, regardless of the optional arguments it is given?

Upvotes: 17

Views: 3922

Answers (1)

cbeleites
cbeleites

Reputation: 14093

Short answer: it does return the results in the correct order.

But of course, you should read the code yourself (mclapply is an R function...)

The man page for collect gives some more hints:

Note: If expr uses low-level multicore functions such as sendMaster a single job can deliver results multiple times and it is the responsibility of the user to interpret them correctly.

However, if you don't mess with low-level,

collect returns any results that are available in a list. The results will have the same order as the specified jobs. If there are multiple jobs and a job has a name it will be used to name the result, otherwise its process ID will be used.

(my emphasis)

Now for mclapply. A quick glanc over the source code yields:

  • if !mc.preschedule and there are no more jobs than cores (length (X) <= cores) parallel and collect are used, see above.
  • if mc.preschedule or more jobs than cores, mclapply itself takes care of the order - see the code.

However, here's a slightly modified version of your experiment:

> unlist (mclapply(1:10, function(x){
    Sys.sleep(sample(1:5, size = 1)); 
    cat (x, " ");    
    identity(x)}, 
  mc.cores = 2, mc.preschedule = FALSE))
1  2  4  3  6  5  7  8  9  10   [1]  1  2  3  4  5  6  7  8  9 10
> unlist (mclapply(1:10, function(x){
    Sys.sleep(sample(1:5, size = 1)); 
    cat (x, " ");    
    identity(x)}, 
  mc.cores = 2, mc.preschedule = TRUE))
1  3  2  5  4  6  7  8  10  9   [1]  1  2  3  4  5  6  7  8  9 10

Which shows that the results are returned in different order by the child jobs (more precisely: child jobs are about to finish in different order), but the result is assembled in the original order.

(works on the console, but not in RStudio - the cats do not show up there)

Upvotes: 20

Related Questions