Joe Half Face
Joe Half Face

Reputation: 2333

Elixir: start processes at very same time

Let's say I have this module

defmodule Loader do

  def spawn_pools(0, host, iteations, pids) do
    launch!(pids) #something I want to achieve
  end

  def spawn_pools(pools, host, iterations, pids) do
    pid = spawn_link(__MODULE__, :siege, [host, iterations])
    spawn_pools(pools-1, host, iterations, [pid|pids])
  end

end

So if other module will execute Loader.spawn_pools(10, host, iterations, []), it will spawn 10 processes of executing method siege.

The problem is that I want it to be as parallel as it can be -- to start execution of all processes at very same moment of time.

So I thought of this

def siege do
  receive do
   {:launch} -> #...
  end
end

But it kind of brings me to the same problem - so then I need to send :launch to all this processes at same time. Which brings me to recursion, another layer of same problem.

P.S. I'm new to Erlang/Elixir paradigm, so may be I'm missing something?

Upvotes: 1

Views: 814

Answers (2)

Greg
Greg

Reputation: 8340

The closest you can get is using a list comprehension. It's a language construct and therefore theoretically could be compiled to be executed in parallel (however, it's not due to other issues described later). See how the parallel_eval function is written in an official Erlang library. This is essentially doing something like this:

[spawn(fun() -> ReplyTo ! {self(), promise_reply, M:F(A)} end) || A <- ArgL]

of which example you can see in my Erlang code.

If you think about it it's impossible to start executing some processes exactly in parallel because at the lowest level the physical CPU has to start executing each process sequentially. The Erlang VM needs to allocate a stack for the new process, which, according to the documentation takes 309 words of memory. Then it needs to pass the initial parameters, add it to the scheduler, etc. See also this thread which contains more technical references explaining Erlang processes.

EDIT:

You can benchmark how long it takes to create one process, and this simple code is a quick stab at two aproaches:

-module(spawner).

-export([start1/1, start2/1]).

start1(N) ->
    start_new1(erlang:monotonic_time(), self(), 4),
    loop(round(math:pow(4, N)), 0, []).

start_new1(Start, Pid, N) ->
    Fun = fun() -> child(Start, Pid, N-1) end,
    [spawn(Fun) || _ <- lists:seq(1, 4)].

child(Start, Pid, 0) -> send_diff(Start, Pid);
child(Start, Pid, N) -> start_new1(Start, Pid, N).

loop(All, All, Acc) ->
    {All, lists:sum(Acc)/All, lists:min(Acc), lists:max(Acc)};
loop(All, Y, Acc) ->
    receive Time -> loop(All, Y+1, [Time|Acc]) end.

send_diff(Start, Pid) ->
    Diff = erlang:monotonic_time() - Start,
    Pid ! erlang:convert_time_unit(Diff, native, micro_seconds).


start2(N) ->
    All = round(math:pow(4, N)),
    Pid = self(),
    Seq = lists:seq(1, All),
    Start = erlang:monotonic_time(),

    Fun = fun() -> send_diff(Start, Pid) end,
    [spawn(Fun) || _ <- Seq],
    loop(All, 0, []).

start1/1 spawns a tree of processes - each process spawns 4 children processes. The argument is the amount of generations, e.g. there will be 4^N leaf processes (256 for N=4). start2/1 spawns the same effective amount of processes but sequentially, one by one. In both cases the output is the average, minimum, and maximum amount of time to spawn one process (the leaf in case of the tree) in microseconds.

1> c(spawner).
{ok,spawner}
2> spawner:start1(4).
{256,868.8671875,379,1182}
3> spawner:start2(4).
{256,3649.55859375,706,4829}
4> spawner:start2(5).
{1024,2260.6494140625,881,4529}

Note that in start1 apart from the leaf processes there will be many more supporting processes which only live to generate children. It seems that the time from the start to generating each leaf child is shorter in the first case, but in my environment it didn't want to finish in a reasonable time for N=5. But you could take this idea or something similar and tune the N and amount of children processes spawned by each process according to your needs.

Upvotes: 0

whatyouhide
whatyouhide

Reputation: 16781

Erlang and Elixir execute code sequentially in each process; since processes are spawned from other processes, it's in the nature of the language that the act of spawning is sequential. There's no way to synchronize the spawning of ≥ 1 processes. Sending a message to each process to "synchronize" the starting of the processes' jobs has the same problem: sending a message is sequential, so the main process will still be sending messages one at a time. Even if you distribute the spawning/message-sending over multiple processes, guaranteeing they all start at the exact same time is basically impossible.

However, both message sending as well as process spawning are very fast actions, so the problem is usually small.

A solution could be to get the current timestamp before spawning any process, and passing it to every new process: that process will then get its current timestamp, subtract the initial timestamp, and thus get how "later" it has been spawned. You can use this information to take advantage of things like :timer.sleep/1 to try and emulate a synchronized start, but it's still subject to varying degrees of precision in clocks and whatnot :).

Upvotes: 3

Related Questions