GBatta
GBatta

Reputation: 41

Parallel assignment operations with nested loops in Julia 1.5

This is a closely related to the following post, answered very well by Przemyslaw Szufel.

How can I run a simple parallel array assignment operation in Julia?

Given that I have a 40-core machine, I decided to follow Przemyslaw's advice and go with @distributed, rather than Threads, to perform the array assignment operations. This sped things up quite nicely.

My algorithm's only slight difference with the above user's situation is that I have nested loops. Of course, I could always vectorize the array I'm performing the assignment operation on, but that would complicate my code. Should I simply include @sync @distributed before the outermost loop, and leave it at that? Or would I need to put additional macros before the (two, in my case) inner loops to maximize the benefits of parallelization?

Upvotes: 1

Views: 334

Answers (1)

Przemyslaw Szufel
Przemyslaw Szufel

Reputation: 42244

In case of distributed loops you normally want to parallelize only the outermost loop. Why? Because distributed the workload takes a significant amount of time.

However there are scenarios where you might want to search for different parallelization strategies.

Let us consider a scenario with unbalanced execution time. @distributed takes a naive approach equally splitting the loop between the workers. Suppose you have a loop such as:

for i in 1:100
    for j in 1:i
       ## do some heavy-lifting
    end
end

Putting @distributed before the outer loop will be very inefficient because all parallel executions is going to wait for the last chunk where all the longest values of j will be processed. This is a typical loop where the value of parallelization is going to be almost non-existent. In situation like this there are usually to approaches:

  • lazy approach: parallelize over the inner loop. This will be good where i takes values orders of magnitude greater than the number of cores
  • efficient approach. Create a proxy variable k in 1:(100*(100+1)/2), distribute over it and then calculate corresponding values of i and j

Finally, if the job times are heavily unbalanced and the approach above does not work you need to use some job polling mechanism. One way to go could be to use asyncmap to spawn remote tasks another way to go would be use external tools - I usually use simple some bash scripts for that - I published my approach to using bash to parallelize jobs on GitHub: https://github.com/pszufe/KissCluster

Upvotes: 1

Related Questions