Understanding supervisor duty in Erlang/Elixir

Question

I wrote a new library called director.
It's a supervisor library.
One of its feature is giving a fun with arity 2 to director, and director will call function for every crash of process, first argument is crash reason and second is crash count, for example:

-module(director_test).
-behaviour(director).
-export([start_link/0, init/1]).

start_link() ->
    director:start_link(?MODULE, []).

init([]) ->
    ChildSpec = #{id => foo,
                  start => {m, f, args},
                  plan => [fun my_plan/2],
                  count => infinity},
    {ok, [ChildSpec]}.

my_plan(normal, Count) when Count rem 10 == 0 ->
    %% If process crashed with reason normal after every 10 times
    %%, director will restart it after spending 3000 milliseconds.
    {restart, 3000};
my_plan(normal, _Count) ->
    %% If process crashed with reason normal director will restart its
    restart;
my_plan(killed, _Count) ->
    %% If process was killed, Director will delete it from its children
    delete;
my_plan(Reason, Count) ->
    %% For other reasons, director will crash with reason {foo_crashed, Reason}
    {stop, {foo_crashed, Reason}}.

I announced my library in Slack and they was wondering about writing new supervisor in this way ! Someone said that "I tend to not let the supervisor handle back-off".
Finally they did not tell me clean information and i think i need to know more about supervisor and its duty, etc. I think that a supervisor is a process that should understand when to restart which child and when to delete which child and when to not restart which child. Am i right?

Can you tell me some good features of OTP/Supervisor that i have not in Director? (List of director's features)

zxq9 · Accepted Answer

You are mixing the ideas of supervision and management.

Supervision is already a part of OTP. It is the basic idea that:

No process can ever possibly become an orphan
Crashes will be restarted or aborted, and this is an architectural decision made before internal logic is written.
Crashes can be logged externally (handled by a process other than whatever failed).
Error handling code, crash forensics, and so on never occur as part of supervision. Ever. (Complex logic leads to complex weirdness, and supervision needs to be simple, robust, and reliable.)

Management is something that may or may not be present in your system, so it is left up to you. It is the idea that you would have a single (usually named) process that guides the overall high-level task that your (supervised) workers are doing. Having a manager process gives you a single point of control for the overall effort being done -- which also means it is a single place you can tell that overall effort to start, stop, suspend itself, etc. and this is where you could add additional logic about selective restarts based on some crash condition.

Think of "supervision" as a low-level, system framework type idea. It is always the same in all programs just like opening a file or handling a network socket would be. Think of management as one discrete chunk of the actual problem your program needs to solve to accomplish its work.

Management may or may not be complex. Supervision must always be uniform and simple. Giving a supervisor too much responsibility makes them difficult to understand and debug, and often leads to business problems -- an overloaded supervisor can be a major problem in a system. Don't burden your supervisors with high-level management tasks.

I wrote an article about the "service -> worker pattern" in Erlang a while back. Hopefully it informs more than it confuses: https://zxq9.com/archives/1311

Understanding supervisor duty in Erlang/Elixir

Answers (2)

Related Questions