Elixir: start processes at very same time

Question

Let's say I have this module

defmodule Loader do

  def spawn_pools(0, host, iteations, pids) do
    launch!(pids) #something I want to achieve
  end

  def spawn_pools(pools, host, iterations, pids) do
    pid = spawn_link(__MODULE__, :siege, [host, iterations])
    spawn_pools(pools-1, host, iterations, [pid|pids])
  end

end

So if other module will execute Loader.spawn_pools(10, host, iterations, []), it will spawn 10 processes of executing method siege.

The problem is that I want it to be as parallel as it can be -- to start execution of all processes at very same moment of time.

So I thought of this

def siege do
  receive do
   {:launch} -> #...
  end
end

But it kind of brings me to the same problem - so then I need to send :launch to all this processes at same time. Which brings me to recursion, another layer of same problem.

P.S. I'm new to Erlang/Elixir paradigm, so may be I'm missing something?

Greg · Accepted Answer

The closest you can get is using a list comprehension. It's a language construct and therefore theoretically could be compiled to be executed in parallel (however, it's not due to other issues described later). See how the parallel_eval function is written in an official Erlang library. This is essentially doing something like this:

[spawn(fun() -> ReplyTo ! {self(), promise_reply, M:F(A)} end) || A <- ArgL]

of which example you can see in my Erlang code.

If you think about it it's impossible to start executing some processes exactly in parallel because at the lowest level the physical CPU has to start executing each process sequentially. The Erlang VM needs to allocate a stack for the new process, which, according to the documentation takes 309 words of memory. Then it needs to pass the initial parameters, add it to the scheduler, etc. See also this thread which contains more technical references explaining Erlang processes.

EDIT:

You can benchmark how long it takes to create one process, and this simple code is a quick stab at two aproaches:

-module(spawner).

-export([start1/1, start2/1]).

start1(N) ->
    start_new1(erlang:monotonic_time(), self(), 4),
    loop(round(math:pow(4, N)), 0, []).

start_new1(Start, Pid, N) ->
    Fun = fun() -> child(Start, Pid, N-1) end,
    [spawn(Fun) || _ <- lists:seq(1, 4)].

child(Start, Pid, 0) -> send_diff(Start, Pid);
child(Start, Pid, N) -> start_new1(Start, Pid, N).

loop(All, All, Acc) ->
    {All, lists:sum(Acc)/All, lists:min(Acc), lists:max(Acc)};
loop(All, Y, Acc) ->
    receive Time -> loop(All, Y+1, [Time|Acc]) end.

send_diff(Start, Pid) ->
    Diff = erlang:monotonic_time() - Start,
    Pid ! erlang:convert_time_unit(Diff, native, micro_seconds).


start2(N) ->
    All = round(math:pow(4, N)),
    Pid = self(),
    Seq = lists:seq(1, All),
    Start = erlang:monotonic_time(),

    Fun = fun() -> send_diff(Start, Pid) end,
    [spawn(Fun) || _ <- Seq],
    loop(All, 0, []).

start1/1 spawns a tree of processes - each process spawns 4 children processes. The argument is the amount of generations, e.g. there will be 4^N leaf processes (256 for N=4). start2/1 spawns the same effective amount of processes but sequentially, one by one. In both cases the output is the average, minimum, and maximum amount of time to spawn one process (the leaf in case of the tree) in microseconds.

1> c(spawner).
{ok,spawner}
2> spawner:start1(4).
{256,868.8671875,379,1182}
3> spawner:start2(4).
{256,3649.55859375,706,4829}
4> spawner:start2(5).
{1024,2260.6494140625,881,4529}

Note that in start1 apart from the leaf processes there will be many more supporting processes which only live to generate children. It seems that the time from the start to generating each leaf child is shorter in the first case, but in my environment it didn't want to finish in a reasonable time for N=5. But you could take this idea or something similar and tune the N and amount of children processes spawned by each process according to your needs.

Elixir: start processes at very same time

Answers (2)

Related Questions