What happens to a RabbitMQ cluster if the only disc node dies?

Question

RabbitMQ clusters need to have at least one disc node (you can't turn your last disc node to a ram node).

However (especially in a cloud context) nodes can die - what is supposed to happen to the cluster if the only disc node dies?

Does the cluster automatically appoint a new disc node, or it continues working with no disc node.

pinepain · Accepted Answer

Short answer: In case all disc nodes dies and you have at least one RAM node you'll get RAM-only cluster. In case only one RAM node left and it goes down and then up, only durable entities will reside on it.

Long answer:

If you use clustering as it described in Clustering Guide queues reside only on one node:

All data/state required for the operation of a RabbitMQ broker is replicated across all nodes, for reliability and scaling, with full ACID properties. An exception to this are message queues, which by default reside on the node that created them, though they are visible and reachable from all nodes. To replicate queues across nodes in a cluster, see the documentation on high availability (note that you will need a working cluster first).

So when node dies (not only disc one, it applied to RAM too) you lose queues (with content) resides on that node.

If you use High Availability to mirror queue across more than one nodes (actually, it depends how you set it up, see detailed explanation on ha-mode and ha-policy policy keys - all, exactly and nodes).

With HA, if queue has some ha-policy set and the node it reside dies, that queue will be tried to be mirrored to other nodes, including RAM-only one (sure, it depends how you set up ha-mode, for example if it set to nodes and all nodes from list dies you lose the queue).

So after such intro,

If you turn off all disc nodes and you have only RAM nodes and queues fits the memory everything will works normally. If queues doesn't fit in memory, Flow Control memory limits applied which explained in clustering doc in Restarting section (at the end of e:

At least one disk node should be running at all times to prevent data loss. RabbitMQ will prevent the creation of a RAM-only cluster in many situations, but it still won't stop you from stopping and forcefully resetting all the disc nodes, which will lead to a RAM-only cluster. Doing this is not advisable and makes losing data very easy.

and a bit more from clustering doc:

A node can be a disk node or a RAM node. (Note: disk and disc are used interchangeably. Configuration syntax or status messages normally use disc.) RAM nodes keep their state only in memory (with the exception of queue contents, which can reside on disc if the queue is persistent or too big to fit in memory). Disk nodes keep state in memory and on disk. As RAM nodes don't have to write to disk as much as disk nodes, they can perform better. However, note that since the queue data is always stored on disc, the performance improvements will affect only resources management (e.g. adding/removing queues, exchanges, or vhosts), but not publishing or consuming speed. Because state is replicated across all nodes in the cluster, it is sufficient (but not recommended) to have just one disk node within a cluster, to store the state of the cluster safely.

So if you don't literally add any disc node you'll get RAM-only cluster. It may be fast in some cases, but if all nodes goes down you will lose all your queues with it content forever, except durable ones while any node dump persistent queues and messages on disc.

But don't rely on RAM node dump persistent entities on disc, while under certain situations it may not dump at all or not all entities (especially, messages).

There are old mailing list threads which may bring some extra light on situation:

What happens to a RabbitMQ cluster if the only disc node dies?

Answers (1)

Related Questions