Reputation: 1273
One of the RabbitMQ nodes went down but remained in hung state which didn't let the other node to become active node. Have a VIP through which applications connect to Rabbit, in case one of the nodes go down the VIP switches to the other node but this didn't happen as the first node, that went down, hung itself.
Below are the logs from that time.
[Then Active Node that hung]
=CRASH REPORT==== 11-Jun-2017::05:15:09 ===
crasher:
initial call: rabbit_disk_monitor:init/1
pid: <0.253.0>
registered_name: rabbit_disk_monitor
exception exit: {eagain,[{erlang,open_port,
[{spawn,"/bin/sh -s unix:cmd 2>&1"},
[stream]],
[]},
{os,start_port_srv_handle,1,
[{file,"os.erl"},{line,313}]},
{os,start_port_srv_loop,0,
[{file,"os.erl"},{line,329}]}]}
in function gen_server:terminate/7 (gen_server.erl, line 826)
ancestors: [rabbit_disk_monitor_sup,rabbit_sup,<0.243.0>]
messages: []
links: [<0.252.0>]
dictionary: []
trap_exit: false
status: running
heap_size: 1598
stack_size: 27
reductions: 2036096302
neighbours:
[The other node which should have taken over]
=CRASH REPORT==== 11-Jun-2017::05:15:09 ===
crasher:
initial call: rabbit_disk_monitor:init/1
pid: <0.253.0>
registered_name: rabbit_disk_monitor
exception exit: {eagain,[{erlang,open_port,
[{spawn,"/bin/sh -s unix:cmd 2>&1"},
[stream]],
[]},
{os,start_port_srv_handle,1,
[{file,"os.erl"},{line,313}]},
{os,start_port_srv_loop,0,
[{file,"os.erl"},{line,329}]}]}
in function gen_server:terminate/7 (gen_server.erl, line 826)
ancestors: [rabbit_disk_monitor_sup,rabbit_sup,<0.243.0>]
messages: []
links: [<0.252.0>]
dictionary: []
trap_exit: false
status: running
heap_size: 1598
stack_size: 27
reductions: 2036096302
neighbours:
Could, using delayed message plugin and mnesia have caused this?
Upvotes: 1
Views: 766
Reputation: 22750
eagain
means Resource temporarily unavailable
Most likely you ran out the filedescriptors.
Check your current filedescription configuration and try to increase it
Check this links for more details:
Upvotes: 1