Reputation: 6058
I have a 3 node Cassandra cluster with a replication factor of 3. This means that all data should be replication on to all 3 nodes.
The following is the output of nodetool status:
-- Address Load Tokens Owns (effective) Host ID Rack
UN 192.168.0.1 27.66 GB 256 100.0% 2e89198f-bc7d-4efd-bf62-9759fd1d4acc RAC1
UN 192.168.0.2 28.77 GB 256 100.0% db5fd62d-3381-42fa-84b5-7cb12f3f946b RAC1
UN 192.168.0.3 27.08 GB 256 100.0% 1ffb4798-44d4-458b-a4a8-a8898e0152a2 RAC1
This is a graph of disk usage over time on all 3 of the nodes:
My question is why do these sizes vary so much? Is it that compaction hasn't run at the same time?
Upvotes: 1
Views: 407
Reputation: 9475
I would say several factors could play a role here.
As you note, compaction will not run at the same time, so the number and contents of the SSTables will be somewhat different on each node.
The memtables will also not have been flushed to SSTables at the same time either, so right from the start, each node will have somewhat different SSTables.
If you're using compression for the SSTables, given that their contents are somewhat different, the amount of space saved by compressing the data will vary somewhat.
And even though you are using a replication factor of three, I would imagine the storage space for non-primary range data is slightly different than the storage space for primary range data, and it's likely that more primary range data is being mapped to one node or the other.
So basically unless each node saw the exact same sequence of messages at exactly the same time, then they wouldn't have exactly the same size of data.
Upvotes: 3