Kushagra Bindal
Kushagra Bindal

Reputation: 131

Node is not able to join cluster in v3.8.24 version

We are upgrading our system from RabbitMQ version 3.6.10 & Erlang version v19.3.4 to RabbitMQ v3.8.24 and Erlang version v23.3.4.8.

We are using Rightscale to deploy our deployments. While performing resiliency testing on 3 node cluster we had deleted one node (node3) and as a result 1 new node (node4) auto churned with the same cluster Id. All the cluster join commands are well in place and are working properly for 3.6.10. But we have observed that after upgrading the newly launched node on v3.8.24 is not able to join the cluster. Rather than it is treating itself as a new single node deployment.

On the 1st and 2nd node we are getting below error in the crash.log file.

2022-02-17 09:01:32 =ERROR REPORT==== ** gen_event handler lager_exchange_backend crashed. ** Was installed in lager_event ** Last event was: {log,{lager_msg,[],[{pid,<0.44.0>}],info,{["2022",45,"02",45,"17"],["07",58,"45",58,"12",46,"982"]},{1645,83912,982187},[65,112,112,108,105,99,97,116,105,111,110,32,"mnesia",32,101,120,105,116,101,100,32,119,105,116,104,32,114,101,97,115,111,110,58,32,"stopped"]}} ** When handler state == {state,{mask,127},lager_default_formatter,[date," ",time," ",color,"[",severity,"] ",{pid,[]}," ",message,"\n"],-576448326,{resource,<<"/">>,exchange,<<"amq.rabbitmq.log">>}} ** Reason == {badarg,[{ets,lookup,[rabbit_exchange,{resource,<<"/">>,exchange,<<"amq.rabbitmq.log">>}],[]},{rabbit_misc,dirty_read,1,[{file,"src/rabbit_misc.erl"},{line,367}]},{rabbit_basic,publish,1,[{file,"src/rabbit_basic.erl"},{line,65}]},{lager_exchange_backend,handle_log_event,2,[{file,"src/lager_exchange_backend.erl"},{line,173}]},{gen_event,server_update,4,[{file,"gen_event.erl"},{line,620}]},{gen_event,server_notify,4,[{file,"gen_event.erl"},{line,602}]},{gen_event,server_notify,4,[{file,"gen_event.erl"},{line,604}]},{gen_event,handle_msg,6,[{file,"gen_event.erl"},{line,343}]}]} 2022-02-17 09:01:37 =ERROR REPORT==== ** Connection attempt from node 'rabbit@node-4' rejected. Invalid challenge reply. ** 2022-02-17 09:01:37 =ERROR REPORT==== ** Connection attempt from node 'rabbitmqcli-481-rabbit@ node -4' rejected. Invalid challenge reply. **

*node-4 is the new node which is churned automatically.

Here we have two concerns.

  1. Why the newly churned node is not able to join the cluster.
  2. It has been observed that post termination old node details are still present in Disc Nodes section. Is there any specific reason for retaining it or some configurational changes that need to be performed.

Regards Kushagra

Upvotes: 1

Views: 678

Answers (1)

Kushagra Bindal
Kushagra Bindal

Reputation: 131

I was somehow able to resolve the issue by doing some googling. So, just thought to share my findings with you. Might be it will help someone.

Based on RabbitMQ recommendations it is always good to have RabbitMQ cluster having static nodes.

It might be possible that an unresponsive node might be able to rejoin cluster once recovered and dynamic removal of the nodes is not recommended. Please refer https://www.rabbitmq.com/cluster-formation.html#:~:text=Nodes%20in%20clusters,understood%20and%20considered.

After having all due diligence, in case if we want to remove the unused node then we can use forget_cluster_node and pass the expired node name from any working node. It will clean all the entries.

I hope it will help you guys.

Regards Kushagra

Upvotes: 1

Related Questions