Lalit Mehra
Lalit Mehra

Reputation: 1273

RabbitMQ crashed and remained in hung state

One of the RabbitMQ nodes went down but remained in hung state which didn't let the other node to become active node. Have a VIP through which applications connect to Rabbit, in case one of the nodes go down the VIP switches to the other node but this didn't happen as the first node, that went down, hung itself.

Below are the logs from that time.
[Then Active Node that hung]

=CRASH REPORT==== 11-Jun-2017::05:15:09 ===
  crasher:
    initial call: rabbit_disk_monitor:init/1
    pid: <0.253.0>
    registered_name: rabbit_disk_monitor
    exception exit: {eagain,[{erlang,open_port,
                                     [{spawn,"/bin/sh -s unix:cmd 2>&1"},
                                      [stream]],
                                     []},
                             {os,start_port_srv_handle,1,
                                 [{file,"os.erl"},{line,313}]},
                             {os,start_port_srv_loop,0,
                                 [{file,"os.erl"},{line,329}]}]}
      in function  gen_server:terminate/7 (gen_server.erl, line 826)
    ancestors: [rabbit_disk_monitor_sup,rabbit_sup,<0.243.0>]
    messages: []
    links: [<0.252.0>]
    dictionary: []
    trap_exit: false
    status: running
    heap_size: 1598
    stack_size: 27
    reductions: 2036096302
  neighbours:

[The other node which should have taken over]

=CRASH REPORT==== 11-Jun-2017::05:15:09 ===
  crasher:
    initial call: rabbit_disk_monitor:init/1
    pid: <0.253.0>
    registered_name: rabbit_disk_monitor
    exception exit: {eagain,[{erlang,open_port,
                                     [{spawn,"/bin/sh -s unix:cmd 2>&1"},
                                      [stream]],
                                     []},
                             {os,start_port_srv_handle,1,
                                 [{file,"os.erl"},{line,313}]},
                             {os,start_port_srv_loop,0,
                                 [{file,"os.erl"},{line,329}]}]}
      in function  gen_server:terminate/7 (gen_server.erl, line 826)
    ancestors: [rabbit_disk_monitor_sup,rabbit_sup,<0.243.0>]
    messages: []
    links: [<0.252.0>]
    dictionary: []
    trap_exit: false
    status: running
    heap_size: 1598
    stack_size: 27
    reductions: 2036096302
  neighbours:

Could, using delayed message plugin and mnesia have caused this?

Upvotes: 1

Views: 766

Answers (1)

Gabriele Santomaggio
Gabriele Santomaggio

Reputation: 22750

eagain means Resource temporarily unavailable

Most likely you ran out the filedescriptors.

Check your current filedescription configuration and try to increase it

Check this links for more details:

Upvotes: 1

Related Questions