Arkaitz Jimenez
Arkaitz Jimenez

Reputation: 23178

Erlang/OTP, how to signal an application startup error without a crash

So I have this application that has a process that requires some gen_servers to be alive somewhere else in the cluster.
If they are up it just works, if they are not, my gen_server fails in init with {error,Reason}, this propagates through my supervisor into my applications start function.
The problem is that if I return anything other than {ok,Pid} I get a crash report.

My intention here would be to somehow signal that the application couldn't start properly and that all the processes are down and because of that the application should not be considered active, however, I can only choose to return {ok, self()} and see my application listed as active when it is not, or return {error, Error} and see how it crashes with:

{gen_server,init_it,6},{proc_lib,init_p_do_apply,3}]}},{ancestors,[rtb_sup,<0.134.0>]},
{messages,[]},{links,[<0.135.0>]},{dictionary,[]},{trap_exit,false},{status,running},
{heap_size,377},{stack_size,24},{reductions,255}],[]]:\n"

The problem seems to be bigger than this, basically there is no way to tell to the application framework that the app failed. It may look like one of these things that are handled by let the process die in erlang, but allow for an {error, } return value on application:start seems like a good tradeoff.

Any hints?

Upvotes: 2

Views: 284

Answers (1)

Chen Yu
Chen Yu

Reputation: 4077

Application will crash at any moment, so application's dependence relationship at the start time can not provide helpful dynamic crash information.

Before I have read part of rabbitmq project source code, it is also a cluster-based program.

I think rabbitmq has faced your similar question as you said, because cluster need collect related nodes's application "is live" information and memory water highmark information and then make decision.

It's solution is

  1. to register the the first main process of the application in the node locally, the name is "rabbit" in the rabbitmq system, you can find it is rabbit.erl file, and in the function "start/2".

    start(normal, []) -> case erts_version_check() of ok -> {ok, SupPid} = rabbit_sup:start_link(), true = register(rabbit, self()), print_banner(), [ok = run_boot_step(Step) || Step <- boot_steps()], io:format("~nbroker running~n"), {ok, SupPid}; Error -> Error end.

  2. And the other 4 modules, rabbit_node_monitor.erl, rabbit_memory_monitor.erl, vm_memory_monitor.erl, rabbit_alarm.erl to use two erlang technique, one is monitor process to get "DOWN" message of the registered process, the other is alarm handler to collect these information.

Upvotes: 1

Related Questions