Why is the load different on a 3 node cluster with RF 3?

Question

I have a 3 node Cassandra cluster with a replication factor of 3. This means that all data should be replication on to all 3 nodes.

The following is the output of nodetool status:

--  Address      Load       Tokens       Owns (effective)  Host ID                               Rack
UN  192.168.0.1  27.66 GB   256          100.0%            2e89198f-bc7d-4efd-bf62-9759fd1d4acc  RAC1
UN  192.168.0.2  28.77 GB   256          100.0%            db5fd62d-3381-42fa-84b5-7cb12f3f946b  RAC1
UN  192.168.0.3  27.08 GB   256          100.0%            1ffb4798-44d4-458b-a4a8-a8898e0152a2  RAC1

This is a graph of disk usage over time on all 3 of the nodes:

My question is why do these sizes vary so much? Is it that compaction hasn't run at the same time?

Jim Meyer · Accepted Answer

I would say several factors could play a role here.

As you note, compaction will not run at the same time, so the number and contents of the SSTables will be somewhat different on each node.

The memtables will also not have been flushed to SSTables at the same time either, so right from the start, each node will have somewhat different SSTables.

If you're using compression for the SSTables, given that their contents are somewhat different, the amount of space saved by compressing the data will vary somewhat.

And even though you are using a replication factor of three, I would imagine the storage space for non-primary range data is slightly different than the storage space for primary range data, and it's likely that more primary range data is being mapped to one node or the other.

So basically unless each node saw the exact same sequence of messages at exactly the same time, then they wouldn't have exactly the same size of data.

Why is the load different on a 3 node cluster with RF 3?

Answers (1)

Related Questions