billcyz
billcyz

Reputation: 1389

Can't start process in erlang node

I have two erlang nodes, node01 is '[email protected]', node02 is '[email protected]'. I want to start one process on node01 by using spawn(Node, Mod, Fun, Args) on node02, but I always get useless pid.

Node connection is ok:

([email protected])14> net_adm:ping('[email protected]').
pong

Module is in the path of node01 and node02:

([email protected])7> m(remote_process).
Module: remote_process
MD5: 99784aa56b4feb2f5feed49314940e50
Compiled: No compile time info available
Object file: /src/remote_process.beam
Compiler options:  []
Exports: 
         init/1
         module_info/0
         module_info/1
         start/0
ok

([email protected])20> m(remote_process).
Module: remote_process
MD5: 99784aa56b4feb2f5feed49314940e50
Compiled: No compile time info available
Object file: /src/remote_process.beam
Compiler options:  []
Exports: 
         init/1
         module_info/0
         module_info/1
         start/0
ok

However, the spawn is not successful:

([email protected])21> spawn('[email protected]', remote_process, start, []). 
I'm on node '[email protected]'
<9981.89.0>
My pid is <9981.90.0>

([email protected])8> whereis(remote_process).
undefined

The process is able to run on local node:

([email protected])18> remote_process:start().
I'm on node '[email protected]'
My pid is <0.108.0>
{ok,<0.108.0>}

([email protected])24> whereis(remote_process).
<0.115.0>

But it fails on remote node. Can anyone give me some idea?

Here is the source code remote_process.erl:

-module(remote_process).
-behaviour(supervisor).
-export([start/0, init/1]).

start() ->
    {ok, Pid} = supervisor:start_link({global, ?MODULE}, ?MODULE, []),
    {ok, Pid}.

init([]) ->
    io:format("I'm on node ~p~n", [node()]),
    io:format("My pid is ~p~n", [self()]),
    {ok, {{one_for_one, 1, 5}, []}}.

Upvotes: 0

Views: 731

Answers (1)

Pascal
Pascal

Reputation: 14042

You are using a global registration for your process, it is necessary for your purpose. The function to retrieve it is global:whereis_name(remote_process).

Edit : It works if

  • the 2 nodes are connected (check with nodes())
  • the process is registered with the global module
  • the process is still alive

if any of these conditions is not satisfied you will get undefined

enter image description here

Edit 2: start node 1 with : werl -sname p1 and type in the shell :

(p1@W7FRR00423L)1> c(remote_process).
{ok,remote_process}
(p1@W7FRR00423L)2> remote_process:start().
I'm on node p1@W7FRR00423L
My pid is <0.69.0>
{ok,<0.69.0>}
(p1@W7FRR00423L)3> global:whereis_name(remote_process).
<0.69.0>
(p1@W7FRR00423L)4> 

then start a second node with werl - sname p2 and type in the shell (it is ok to connect the second node later, the global registration is "updated" when necessary):

(p2@W7FRR00423L)1> net_kernel:connect_node(p1@W7FRR00423L).
true
(p2@W7FRR00423L)2> nodes().
[p1@W7FRR00423L]
(p2@W7FRR00423L)3> global:whereis_name(remote_process).
<7080.69.0>
(p2@W7FRR00423L)4> 
(p2@W7FRR00423L)4>

Edit 3:

In your test you are spawning a process P1 on the remote node which executes the function remote_process:start/0.

This function calls supervisor:start_link/3 which basically spawns a new supervisor process P2 and links itself to it. after this, P1 has nothing to do anymore so it dies, causing the linked process P2 to die too and you get an undefined reply to the global:whereis_name call.

In my test, I start the process from the shell of the remote node; the shell does not die after I evaluate remote_process:start/0, so the supervisor process does not die and global:whereis_name find the requested pid.

If you want that the supervisor survive to the call, you need an intermediate process that will be spawned without link, so it will not die with its parent. I give you a small example based on your code:

-module(remote_process).
-behaviour(supervisor).
-export([start/0, init/1,local_spawn/0,remote_start/1]).

remote_start(Node) ->
    spawn(Node,?MODULE,local_spawn,[]).

local_spawn() ->
    % spawn without link so start_wait_stop will survive to
    % the death of local_spawn process
    spawn(fun start_wait_stop/0).

start_wait_stop() ->
    start(),
    receive
        stop -> ok
    end.

start() ->
    io:format("start (~p)~n",[self()]),
    {ok, Pid} = supervisor:start_link({global, ?MODULE}, ?MODULE, []),
    {ok, Pid}.

init([]) ->
    io:format("I'm on node ~p~n", [node()]),
    io:format("My pid is ~p~n", [self()]),
    {ok, {{one_for_one, 1, 5}, []}}.

in the shell you get in node 1

(p1@W7FRR00423L)1> net_kernel:connect_node(p2@W7FRR00423L).
true
(p1@W7FRR00423L)2> c(remote_process).
{ok,remote_process}
(p1@W7FRR00423L)3> global:whereis_name(remote_process).
undefined
(p1@W7FRR00423L)4> remote_process:remote_start(p2@W7FRR00423L).
<7080.68.0>
start (<7080.69.0>)
I'm on node p2@W7FRR00423L
My pid is <7080.70.0>
(p1@W7FRR00423L)5> global:whereis_name(remote_process).        
<7080.70.0>
(p1@W7FRR00423L)6> global:whereis_name(remote_process).
undefined

and in node 2

(p2@W7FRR00423L)1> global:registered_names(). % before step 4
[]
(p2@W7FRR00423L)2> global:registered_names(). % after step 4
[remote_process]
(p2@W7FRR00423L)3> rp(processes()).
[<0.0.0>,<0.1.0>,<0.4.0>,<0.30.0>,<0.31.0>,<0.33.0>,
 <0.34.0>,<0.35.0>,<0.36.0>,<0.37.0>,<0.38.0>,<0.39.0>,
 <0.40.0>,<0.41.0>,<0.42.0>,<0.43.0>,<0.44.0>,<0.45.0>,
 <0.46.0>,<0.47.0>,<0.48.0>,<0.49.0>,<0.50.0>,<0.51.0>,
 <0.52.0>,<0.53.0>,<0.54.0>,<0.55.0>,<0.56.0>,<0.57.0>,
 <0.58.0>,<0.62.0>,<0.64.0>,<0.69.0>,<0.70.0>]
ok
(p2@W7FRR00423L)4> pid(0,69,0) ! stop. % between steps 5 and 6
stop
(p2@W7FRR00423L)5> global:registered_names().
[]

Upvotes: 1

Related Questions