Reputation: 1542
Let me explain my current scenario before formulating the questions:
Current Scenario
I have a rabbitmq cluster with 2 nodes created with root and I also have the web administration plugin installed which worked perfectly.
Few days ago one of the nodes went down because the consumers of some queues failed and millions of messages were accumulated, so rabbit collapsed, and wrote everything to disk (/var/lib/rabbitmq/mnesia/name_of_the_node/queues/), the filesystem filled up, and the whole node went down.
Problem/Questions
rabbitmq-server -detached
, the cluster kept working, but the
administration plugin didn't response anymore, so, is there a way to
make it work again without restarting?I'm not really sure on how to do the restart to minimize the problems, and I also want to guarantee that the web admin plugin will work after the restart.
Option 1:
Stop all the nodes with root --> Start all the nodes with rabbitmq
Option 2:
Stop node1 with root --> Start node1 with rabbitmq
Stop node2 with root --> Start node2 with rabbitmq
I'm also open to hear any other advice or suggestion you may have for me.
Upvotes: 0
Views: 1305
Reputation: 2137
It is hard to answer your question without more information. You should at least take a look at the log files and/or post them somewhere.
After you stopped a node running as root, change the entire /var/lib/rabbitmq
ownership to rabbitmq:rabbitmq
. Do the same with /var/log/rabbitmq
. That's the only places where RabbitMQ writes data with the official packages and default configuration.
Because it previously ran as root, Erlang stored its cookie, the shared secret "key" used to allow inter-node communication, in /root/.erlang.cookie
. You need to copy it to /var/lib/rabbitmq/.erlang.cookie
and fix ownership and permissions: it must be readable by the owner only, so a permission of 0400
or 0600
; Erlang will complain if it's readable by the group or anyone.
You can and should do it one node at a time (except if you updated Erlang or RabbitMQ in the meantime). Pay attention to the Erlang cookie I mentionned above. If you start a node with cookie different than the other running node, they will not be able to communicate.
To ensure the cookie is correct before you restart RabbitMQ, you can try to ping the other running RabbitMQ node:
# Open a shell as the `rabbitmq` user and run:
erl -A0 -noinput -noshell -sname foobar \
-eval "io:format(\"~p~n\", [net_adm:ping('rabbit@other-hostname')]), halt()."
In the command line above, replace other-hostname
with the hostname of the other RabbitMQ node. This command should print pong
if everything is ok. If it displays pang
, something is wrong.
Upvotes: 1