Reputation: 1019
Trying to RPC to another node from a script, everything works when using "shortnames" but fails when using "longnames".
Where my local machine name is "Pandora", and after starting a detached node,
erl -detached -noshell -name 'node1@Pandora' -setcookie pandemic
Running this script with name_domain => shortnames
,
#!/usr/bin/env escript
-mode(compile).
-define(THIS_NODE, 'testnode@Pandora').
-define(THAT_NODE, 'node1@Pandora').
show(R) -> io:format("~p~n", [R]).
main(_) ->
net_kernel:start(?THIS_NODE, #{name_domain => shortnames}),
erlang:set_cookie(?THAT_NODE, pandemic),
show( erl_epmd:names() ),
show( net_adm:names() ),
show( net_adm:ping(?THAT_NODE) ),
show( rpc:call(?THAT_NODE, erlang, time, []) ).
works correctly and produces this:
{ok,[{"node1",40965},{"testnode",35319}]}
{ok,[{"node1",40965},{"testnode",35319}]}
pong
{14,1,21}
However, when I change it to name_domain => longnames
, to simulate working in a distributed environment):
net_kernel:start(?THIS_NODE, #{name_domain => longnames}),
The test fails with an error report:
=ERROR REPORT==== 6-Oct-2024::14:09:07.875238 ===
** System running to use fully qualified hostnames **
** Hostname Pandora is illegal **
Clearly, "Pandora" is not an FQDN so I attempted to solve this by creating a local Inets Configuration file called erl_inetrc
and setting the local domain to that of my office router:
{domain, "myoffice.loc"}.
And did so in my test script, as well:
-define(THIS_NODE, '[email protected]').
-define(THAT_NODE, '[email protected]').
I then set the location of the file in the ERL_INETRC
environment variable:
export ERL_INETRC="$(pwd)/erl_inetrc"
Sadly, this resulted in my test script freezing up at the net_adm:names()
command. erl_epmd:names()
worked, however, which is odd given net_adm:names
is supposed to call erl_epmd:names
.
Anybody have any idea why net_adm:names()
freezes up?
Upvotes: 0
Views: 60
Reputation: 1019
Turns out there is a much better answer:
sudo vi /etc/wsl.conf
and add the fully-qualified hostname
to the [network]
section:
[network]
hostname="pandora.wsl"
You will need to restart WSL2 for this to take affect. Best to restart your computer but this will work, too (see step#2).
The local network configuration file required in the original answer is no longer necessary.
Do not call erl_epmd():names
as it will now freeze up. Use
net_adm:names()
, instead (see Notes, below).
To confirm the change with my test script, this line must be removed or commented out to prevent it freezing up:
show( erl_epmd:names() ),
This gets us nearly all the way to what we'd expect from running Erlang/OTP in a
native Linux or Windows OS. But, it doesn't quite fix everything. I'm guessing
the remaining issues are an implementation side-effect of WSL which resolves
hostname
to the localhost IP address of 127.0.0.1
.
Since we are making hostname
fully-qualified, instead of setting
domainname
separately, the hostname will no longer resolve
without the domain name (Some have tried setting domainname
to no
effect).
In my question, I noted net_adm:names()
froze after assigning the
domain in a local config file. Yet, erl_epmd:names()
still worked.
After removing the local config file, and making hostname
fully-qualified, erl_epmd:names()
freezes while net_adm:names()
works, correctly.
In my original answer, I noted using erl_call
to terminate a node:
erl_call -name '[email protected]' -c pandemic -q
failed with an error:
erl_call: can't ei_gethostbyname(Pandora.wsl)
but the command did work using the short name:
erl_call -sname 'node1' -c pandemic -q
.
After the change, using -name '[email protected]'
works,
without error, and using -sname node1
freezes up.
erl_epmd:names()
retrieves the host name from inet:gethostname()
, which
strips the domain from the fully-qualified WSL hostname
. This worked fine with
the default hostname
but now freezes because that name no longer resolves.
net_adm:names()
retrieves the fully-qualified hostname
from net_adm:localhost()
, which appends the domain name to the host name
returned from inet:gethostname()
. That worked, before I made any changes,
because there was no domain name to append. After I added the local config file,
WSL didn't resolve Pandora.wsl
. But, now it does.
It's worth noting erl_empd:names/0
is not included in the API docs and is probably not intended for use by the public. Calling erl_epmd:names/1
with
the WSL fully-qualified hostname
works fine.
This is certainly the better solution but it would be best for WSL to distinguish the host name separately from the domain name, as any other Linux distribution would. If anyone has figured out a way to make that happen, reliably, please share.
The solution turned out to be exceedingly simple but here's how I came to it:
Hostnames must resolve to an IP address. The documentation is not thoroughly clear on
this but researching the net_adm:names()
code confirmed it.
WSL2 runs in a virtual machine with NAT networking where the IP address assigned to the host is not the one assigned by the local router in the local domain.
When "shortnames" are used, everything runs on localhost which will always resolve correctly. For "longnames", hostnames must be fully-qualified with a domain and the FQDN must resolve to an IP address.
My office network address for myoffice.loc
is 10.1.1.0/24. My WSL2 network is 172.16.32.0/24. When net_adm:names()
resolved pandora.myoffice.loc
to 10.1.1.119, it could not bind to the port used by the epmd
daemon. But it failed to report any error and simply froze up.
It turns out setting the local domain to one that won't be found in DNS makes all hosts resolve to the localhost IP of 127.0.0.1
. I have no idea why but it fixed my problem.
So I changed my local config file erl_inetrc
to use a non-existent domain,
{domain, "wsl"}.
Changed the node domains in my test script,
-define(THIS_NODE, '[email protected]').
-define(THAT_NODE, '[email protected]').
Started my detached node with the FQDN,
erl -detached -noshell -name '[email protected]' -setcookie pandemic
And everything works as it should:
{ok,[{"node1",37267},{"testnode",39771}]}
{ok,[{"node1",37267},{"testnode",39771}]}
pong
{20,1,55}
Additional Notes
My detached test node is normally terminated from the command line with erl_call
.
In this case: erl_call -name '[email protected]' -c pandemic -q
.
But, when I ran this command with my dummy domain name, it produced an error:
erl_call: can't ei_gethostbyname(Pandora.wsl)
.
However, the same command will work by using the short name (go figure):
erl_call -sname 'node1' -c pandemic -q
I wasn't able to make WSL2 return the IP it assigns to the host for
[email protected]
. I found a way around it but it will obviously only work for
Erlang/OTP.
While I didn't use any of them, here a couple of networking solutions that show promise.
Upvotes: 0