soubhagya senapati
soubhagya senapati

Reputation: 33

Data not distributed across cluster in Cassandra

We are using a 3 node cluster with REPLICATION = {'class':'SimpleStrategy' , 'replication_factor':1 }

But when we are inserting data , the same row is present in all three nodes (I see it when I run it on each node individually)

When I run nodetool status (I see the below) :

--  Address        Load       Tokens       Owns (effective)  Host ID                               Rack
UN  172.31.46.89   6.43 MiB   256          32.8%             2db6dc5c-9d05-4dc7-9bf5-ea9e3c406267  rack1
UN  172.31.47.150  13.17 MiB  256          32.1%             eb10cc48-6117-427c-9151-48cb6761a5e6  rack1
DN  172.31.45.131  12.73 MiB  256          35.1%             cc33fc04-a02f-41e2-a00b-3835a0d98cb5  rack1

Can anyone help me to understand why data is present in all nodes???

Upvotes: 0

Views: 133

Answers (2)

Mehul Gupta
Mehul Gupta

Reputation: 458

Data will not be stored on all nodes when RF=1. Instead when you connect with any node it act as a coordinator node and fetch data from node responsible for the data and provides the response.

The coordinator only stores data locally (on a write) if it ends up being one of the nodes responsible for the data's token range.

Upvotes: 0

Chris Lohfink
Chris Lohfink

Reputation: 16430

Cassandra is masterless and when you make a query to any node in the cluster it will request the appropriate replica to answer your query. The data will not be stored on all nodes with RF=1. If really want to verify it look at your data/keyspace/table directory and use the sstabledump on the Data file.

Upvotes: 1

Related Questions