ruanhao
ruanhao

Reputation: 4922

RabbitMQ application stops when another node in cluster is shutdown

I am new to RabbitMQ and I have troubles when handling RabbitMQ cluster.

The topology is like:

enter image description here

At first, every is ok. RabbitMQ node1 and RabbitMQ node2 are in a cluster. They are interconnected by a RabbitMQ plugin called autocluster.

Then I delete pod rabbitmq-1 by kubectl delete pod rabbitmq-1. And I found that RabbitMQ application in node1 is stopped. I don't understand why RabbittoMQ will stop application if it detects another node's failure. It does not make sense. Is this behaviour designed by RabbitMQ or autocluster? Can you enlighten me?

My config is like:

[
  {rabbit, [
    {tcp_listen_options, [
                          {backlog,       128},
                          {nodelay,       true},
                          {linger,        {true,0}},
                          {exit_on_close, false},
                          {sndbuf,        12000},
                          {recbuf,        12000}
                         ]},
    {loopback_users, [<<"guest">>]},
    {log_levels,[{autocluster, debug}, {connection, debug}]},
    {cluster_partition_handling, pause_minority},
    {vm_memory_high_watermark, {absolute, "3276MiB"}}
  ]},

  {rabbitmq_management, [
    {load_definitions, "/etc/rabbitmq/rabbitmq-definitions.json"}
  ]},

  {autocluster, [
    {dummy_param_without_comma, true},
    {autocluster_log_level, debug},
    {backend, etcd},
    {autocluster_failure, ignore},
    {cleanup_interval, 30},
    {cluster_cleanup, false},
    {cleanup_warn_only, false},
    {etcd_ttl, 30},
    {etcd_scheme, http},
    {etcd_host, "etcd.kube-system.svc.cluster.local"},
    {etcd_port, 2379}
   ]}
]

In my case, x-ha-policy is enabled.

Upvotes: 1

Views: 1944

Answers (1)

svenwltr
svenwltr

Reputation: 18452

You set cluster_partition_handling to pause_minority. One out of two nodes isn't the majority, so the cluster stops as configured. You either have to add an additional node or set cluster_partition_handling to ignore.

From the docs:

In pause-minority mode RabbitMQ will automatically pause cluster nodes which determine themselves to be in a minority (i.e. fewer or equal than half the total number of nodes) after seeing other nodes go down. It therefore chooses partition tolerance over availability from the CAP theorem. This ensures that in the event of a network partition, at most the nodes in a single partition will continue to run. The minority nodes will pause as soon as a partition starts, and will start again when the partition ends.

Upvotes: 2

Related Questions