Reputation: 131
We are upgrading our system from RabbitMQ version 3.6.10 & Erlang version v19.3.4 to RabbitMQ v3.8.24 and Erlang version v23.3.4.8.
We are using Rightscale to deploy our deployments. While performing resiliency testing on 3 node cluster we had deleted one node (node3) and as a result 1 new node (node4) auto churned with the same cluster Id. All the cluster join commands are well in place and are working properly for 3.6.10. But we have observed that after upgrading the newly launched node on v3.8.24 is not able to join the cluster. Rather than it is treating itself as a new single node deployment.
On the 1st and 2nd node we are getting below error in the crash.log file.
2022-02-17 09:01:32 =ERROR REPORT==== ** gen_event handler lager_exchange_backend crashed. ** Was installed in lager_event ** Last event was: {log,{lager_msg,[],[{pid,<0.44.0>}],info,{["2022",45,"02",45,"17"],["07",58,"45",58,"12",46,"982"]},{1645,83912,982187},[65,112,112,108,105,99,97,116,105,111,110,32,"mnesia",32,101,120,105,116,101,100,32,119,105,116,104,32,114,101,97,115,111,110,58,32,"stopped"]}} ** When handler state == {state,{mask,127},lager_default_formatter,[date," ",time," ",color,"[",severity,"] ",{pid,[]}," ",message,"\n"],-576448326,{resource,<<"/">>,exchange,<<"amq.rabbitmq.log">>}} ** Reason == {badarg,[{ets,lookup,[rabbit_exchange,{resource,<<"/">>,exchange,<<"amq.rabbitmq.log">>}],[]},{rabbit_misc,dirty_read,1,[{file,"src/rabbit_misc.erl"},{line,367}]},{rabbit_basic,publish,1,[{file,"src/rabbit_basic.erl"},{line,65}]},{lager_exchange_backend,handle_log_event,2,[{file,"src/lager_exchange_backend.erl"},{line,173}]},{gen_event,server_update,4,[{file,"gen_event.erl"},{line,620}]},{gen_event,server_notify,4,[{file,"gen_event.erl"},{line,602}]},{gen_event,server_notify,4,[{file,"gen_event.erl"},{line,604}]},{gen_event,handle_msg,6,[{file,"gen_event.erl"},{line,343}]}]} 2022-02-17 09:01:37 =ERROR REPORT==== ** Connection attempt from node 'rabbit@node-4' rejected. Invalid challenge reply. ** 2022-02-17 09:01:37 =ERROR REPORT==== ** Connection attempt from node 'rabbitmqcli-481-rabbit@ node -4' rejected. Invalid challenge reply. **
*node-4 is the new node which is churned automatically.
Here we have two concerns.
Regards Kushagra
Upvotes: 1
Views: 678
Reputation: 131
I was somehow able to resolve the issue by doing some googling. So, just thought to share my findings with you. Might be it will help someone.
Based on RabbitMQ recommendations it is always good to have RabbitMQ cluster having static nodes.
It might be possible that an unresponsive node might be able to rejoin cluster once recovered and dynamic removal of the nodes is not recommended. Please refer https://www.rabbitmq.com/cluster-formation.html#:~:text=Nodes%20in%20clusters,understood%20and%20considered.
After having all due diligence, in case if we want to remove the unused node then we can use forget_cluster_node and pass the expired node name from any working node. It will clean all the entries.
I hope it will help you guys.
Regards Kushagra
Upvotes: 1