Reputation: 3940
Is there a comfy way to somehow get the 'status' of map/pmap in Julia?
If I had an array a = [1:10] i'd like to either:
1: enumerate the array and use if-conditional to add a print command
((index,value) -> 5*value ......, enumerate(a)
and where the "......." are, there would be a way to 'chain' the anonymous function to something like
"5*value and then print index/length(a) if index%200 == 0"
2: know if there is already an existing option for this, as pmap is intended for parallel tasks, which are usually used for large processes so it would make sence for this to already exist?
Additionally, is there a way to make anonymous functions do two 'separate' things one after the other?
Example
if I have
a = [1:1000]
function f(n) #something that takes a huge ammount of time
end
and I execute
map(x -> f(x), a)
the REPL would print out the status
"0.1 completed"
.
.
.
"0.9 completed"
Solution
A bitt odd that the ProgressMeter package doesnt include this by default
Pkg.add("ProgressMeter")
Pkg.clone("https://github.com/slundberg/PmapProgressMeter.jl")
@everywhere using ProgressMeter
@everywhere using PmapProgressmeter
pmap(x->begin sleep(1); x end, Progress(10), 1:10)
PmapProgressMeter on github
Upvotes: 2
Views: 834
Reputation: 10980
Why not just include it in your function's definition to print this information? E.g.
function f(n) #something that takes a huge amount of time
...
do stuff.
...
println("completed $n")
end
And, you can add an extra argument to your function, if desired, that would contain that 0.1
, ... , 0.9
in your example (which I'm not quite sure what those are, but whatever they are, they can just be an argument in your function).
If you take a look at the example below on pmap
and @parallel
you will find an example of a function fed to pmap
that prints output.
See also this and this SO post on info for feeding multiple arguments to functions used with map
and pmap
.
The Julia documentation advises that
pmap() is designed for the case where each function call does a large amount of work. In contrast, @parallel for can handle situations where each iteration is tiny, perhaps merely summing two numbers.
There are several reasons for this. First, pmap
incurs greater start up costs initiating jobs on workers. Thus, if the jobs are very small, these startup costs may become inefficient. Conversely, however, pmap
does a "smarter" job of allocating jobs amongst workers. In particular, it builds a queue of jobs and sends a new job to each worker whenever that worker becomes available. @parallel
by contrast, divvies up all work to be done amongst the workers when it is called. As such, if some workers take longer on their jobs than others, you can end up with a situation where most of your workers have finished and are idle while a few remain active for an inordinate amount of time, finishing their jobs. Such a situation, however, is less likely to occur with very small and simple jobs.
The following illustrates this: suppose we have two workers, one of which is slow and the other of which is twice as fast. Ideally, we would want to give the fast worker twice as much work as the slow worker. (or, we could have fast and slow jobs, but the principal is the exact same). pmap
will accomplish this, but @parallel
won't.
For each test, we initialize the following:
addprocs(2)
@everywhere begin
function parallel_func(idx)
workernum = myid() - 1
sleep(workernum)
println("job $idx")
end
end
Now, for the @parallel
test, we run the following:
@parallel for idx = 1:12
parallel_func(idx)
end
And get back print output:
julia> From worker 2: job 1
From worker 3: job 7
From worker 2: job 2
From worker 2: job 3
From worker 3: job 8
From worker 2: job 4
From worker 2: job 5
From worker 3: job 9
From worker 2: job 6
From worker 3: job 10
From worker 3: job 11
From worker 3: job 12
It's almost sweet. The workers have "shared" the work evenly. Note that each worker has completed 6 jobs, even though worker 2 is twice as fast as worker 3. It may be touching, but it is inefficient.
For for the pmap
test, I run the following:
pmap(parallel_func, 1:12)
and get the output:
From worker 2: job 1
From worker 3: job 2
From worker 2: job 3
From worker 2: job 5
From worker 3: job 4
From worker 2: job 6
From worker 2: job 8
From worker 3: job 7
From worker 2: job 9
From worker 2: job 11
From worker 3: job 10
From worker 2: job 12
Now, note that worker 2 has performed 8 jobs and worker 3 has performed 4. This is exactly in proportion to their speed, and what we want for optimal efficiency. pmap
is a hard task master - from each according to their ability.
Upvotes: 1
Reputation: 10980
One other possibility would be to use a SharedArray
as a counter shared amongst the workers. E.g.
addprocs(2)
Counter = convert(SharedArray, zeros(Int64, nworkers()))
## Make sure each worker has the SharedArray declared on it, so that it need not be fed as an explicit argument
function sendto(p::Int; args...)
for (nm, val) in args
@spawnat(p, eval(Main, Expr(:(=), nm, val)))
end
end
for (idx, pid) in enumerate(workers())
sendto(pid, Counter = Counter)
end
@everywhere global Counter
@everywhere begin
function do_stuff(n)
sleep(rand())
Counter[(myid()-1)] += 1
TotalJobs = sum(Counter)
println("Jobs Completed = $TotalJobs")
end
end
pmap(do_stuff, 1:10)
Upvotes: 0
Reputation: 22215
You can create a function with 'state' as you ask, by implementing a 'closure'. E.g.
julia> F = function ()
ClosedVar = 5
return (x) -> x + ClosedVar
end;
julia> f = F();
julia> f(5)
10
julia> ClosedVar = 1000;
julia> f(5)
10
As you can see, the function f
maintains 'state' (i.e. the internal variable ClosedVar
is local to F
, and f
maintains access to it even though F
itself has technically long gone out of scope.
Note the difference with normal, non-closed function definition:
julia> MyVar = 5;
julia> g(x) = 5 + MyVar;
julia> g(5)
10
julia> MyVar = 1000;
julia> g(5)
1005
You can create your own closure which interrogates / updates its closed variables when run, and does something different according to its state each time.
Having said that, from your example you seem to expect that pmap
will run sequentially. This is not guaranteed. So don't rely on a 'which index is this thread processing' approach to print every 200 operations. You would probably have to maintain a closed 'counter' variable inside your closure, and rely on that. Which presumably also implies your closure needs to be accessible @everywhere
Upvotes: 1
Reputation: 19132
ProgressMeter.jl has a branch for pmap.
You can also make the Juno progress bar work inside of pmap. This is kind of using undocumented things, so you should ask in the Gitter if you want more information because posting this public will just confuse people if/when it changes.
Upvotes: 3