Reputation: 565
I am having a single datacenter with 5 node cassandra cluster up and running. I have created a keyspace with RF=3 and with simple strategy. I need a clarity on below points-
With nodetool getpoints tells on which node our data got physically stored. So i get to know that my data is stored on node-2,3,4. But still i can see the respective sstable and *.db files on each node. So if data is physically stored at node-2,3,4 then why it is showing on node-0,1 as well??
After connecting through cqlsh client, consistency command showing me ONE as a default consistency level. So my understanding is now if i will perform any read or write through copy command will be executed as consistency ONE. So if i will make three or four node down at a time except a node from 2,3,4, will it have any impact on client connection (cassandra read or write query)??
Please let me know on this to get a better clarity on understanding cassandra concepts??
Upvotes: 1
Views: 109
Reputation: 87329
Cassandra distributes data to all nodes of the cluster based on the hash value of specific partition key (this has value is often called token) and number of replicas, so PK with value 1 will be on nodes (3,4,5), with value 2 on the nodes (5, 1, 2), etc. The nodetool getendpoints
gives you location of the rows with specific partition key, not for all data. You can read more about it in the following blog post.
CL ONE
means that we need confirmation of read or write operation only from one replica of the data (you have three replicas because of RF=3). Writes are always sent to all replicas, but we confirm operation if only one answers with success. For your specific question - in your case you can tolerate loss of the 2 nodes that are responsible for specific partition key.
I recommend to read at least first parts of the Cassandra: The Definitive Guide, 3rd edition - it's freely available from DataStax. Or read DataStax Enterprise Architecture Guide - it covers Cassandra's architecture as well.
P.S. I recommend to use DSBulk utility instead of using the copy command - it's heavily optimized for performance when loading or unloading the data (especially for big amounts of data), and much more flexible.
Upvotes: 1