Erlang process dies after disconnect

Question

I got the following setup:

2 Servers with fqdn usa.local and gca.local
1 erlang node on each of them named alice@usa.local and bob@gca.local

When I start Alice (alice:start/0) on alice@usa.local it spawns linked Bob (bob:start/1) on gca.local. Both processing are trapping exits.

When Alice dies of something, Bob gets notified and keeps on running. When Bob dies of something, Alice gets notified and keeps on running.

When I cut the network connection, Alice gets notified that Bob has died of noconnection and process bob dies on bob@gca.local.

I do not want this to happen. I want Bob to keep on running although it looses connection to Alice.

My questions are:

Has this something to do that I initially spawn Bob from the Alice node?
How can I make Bob to survive a connection loss?

Here goes the code:

-module (alice).
-compile (export_all).

start () ->
    register (alice, spawn (fun init/0) ).

stop () ->
    whereis (alice) ! stop.

init () ->
    process_flag (trap_exit, true),
    Bob = spawn_link ('bob@gca.local', bob, start, [self () ] ),
    loop (Bob).

loop (Bob) ->
    receive
        stop -> ok;
        {'EXIT', Bob, Reason} ->
            io:format ("Bob died of ~p.~n", [Reason] ),
            loop (Bob);
        Msg ->
            io:format ("Alice received ~p.~n", [Msg] ),
            loop (Bob)
    end.

-module (bob).
-compile (export_all).

start (Alice) ->
    process_flag (trap_exit, true),
    register (bob, self () ),
    loop (Alice).

loop (Alice) ->
    receive
        stop -> ok;
        {'EXIT', Alice, Reason} ->
            io:format ("Alice died of ~p.~n", [Reason] ),
            loop (Alice);
        Msg ->
            io:format ("Bob received ~p.~n", [Msg] ),
            loop (Alice)
    after 5000 ->
        Alice ! "Hi, this Bob",
        loop (Alice)
    end.

hdima · Accepted Answer

The problem is io:format/2 call on line 13 of bob.erl. When new process is created in spawn_link('bob@gca.local',... it inherit the group leader of alice process which is a process local to alice@usa.local so you will see all output from bob on alice@usa.local terminal. When alice@usa.local is disconnected bob handles EXIT message on line 12 of bob.erl but io:format/2 call on line 13 is failed because group leader was disconnected.

The quick fix is to change all bob's io:format/2 calls to io:format(user, Format, Data). In this case all bob's output will be displayed on bob@gca.local terminal.

However in real projects you really should use gen_server behavior because it handles many rough cases, especially for inter-node communication (don't forget to look at the code). Moreover you really need to use monitor/2 or/and monitor_node/2 instead of link and trap_exit here.

Erlang process dies after disconnect

Answers (2)

Related Questions