Reputation: 466
We are using cassandra to make sure our application availability and backup related features.
Hence,Now I am exploring the concept of node,cluster,datacenter related information. After reading everything,I have been confused lot.
In my local, I have installed cassandra on two machine and both machine can communicated with each other. I can fetch same information on both machine.
My confusions is What is node in my setup?(it means my machine?).
My goal is: I will Setup two amazon EC2 instance both will have cassandra.
if one instance is down, I can fetch my data from other machine.
As I am beginner, Please give me your suggestion.
Thanks
Upvotes: 1
Views: 87
Reputation: 6341
A node is a specific instance of Cassandra running on a machine. In your scenario each machine would be a node.
The ability to fetch data from either machine will have more to do with what your replication factor and replication strategy are set to. The Replication Strategy tells Cassandra how to replicate the data across your nodes/racks/datacenters. The Replication Factor tells Cassandra how many times to replicate the data.
In your scenario (since you are in a single DC) you can use SimpleStrategy for your replication strategy and a Replication Factor (RF) of 2. With this setup you will have all data replicated on both nodes. This will make the data available from either node with a caveat.
In addition to the items listed above there is the concept of a Consistency level (CL) that you set on both reads and writes. There are multiple different CL's that you can choose from and you can set them differently on both a read and write call. In your scenario you would probably want to set a CL of ONE. This would mean that the read or write call would succeed if any of the replicas is capable of writing the data. This would allow one of your nodes to be down and still be able to process queries.
Here are some additional links to read more about these concepts:
Upvotes: 1