Chakri Stark
Chakri Stark

Reputation: 176

who holds the memtables and SSTables in cassandra, the nodes or the cluster?

Each node will have different memtables and SSTables or the entire cluster has certain number of these tables? And, in write operation, first it is written to the commit log and then to memtables and sstables. Is this done by the node? If not, what is the role of node in the write operation as discussed in the picture mentioned below? https://www.google.co.in/url?sa=i&rct=j&q=&esrc=s&source=images&cd=&cad=rja&uact=8&ved=0ahUKEwjEmYGCkovXAhXEMo8KHeLtD48QjRwIBw&url=https%3A%2F%2Fwww.guru99.com%2Fcassandra-architecture.html&psig=AOvVaw0rqVl6BG9vn0TefAPCEb5t&ust=1508999138916139

Upvotes: 1

Views: 1214

Answers (2)

Abhishek Raj
Abhishek Raj

Reputation: 512

Whenever you create a table in Cassandra , a memtable is created. Thus a node may have many memtables. A SStable is created when a flush is triggered.

See this http://abiasforaction.net/apache-cassandra-memtable-flush/

For the other question (Write path)
The operations are carried by node itself and the coordinator node orchestrating it
Whenever a data is inserted,it goes into the memtable and is appended in the commit log.Commit logs are replayed when a node has gone down

So consider you have once again flushed the data after this new insertion ,you will see 2 set(generation) of SStables . Now your partition data exists in multiple SStables.
Note that SStables are immutable. Later on you may also want to read how compaction kicks in.

https://docs.datastax.com/en/cassandra/3.0/cassandra/operations/opsConfigureCompaction.html.

Upvotes: 3

Horia
Horia

Reputation: 2982

Each node has its own memtables and sstables. Of course, there might be a coordinator node that would fire the operation, read/write, but the node holding the data is the one that is actually executing it.

Also, a cluster is a bunch of nodes working together, but there is no "cluster" entity that coordinates everything.

In the picture mentioned, the node is one red dot, the data center is nodes 1-4 and 5-8, so we have to data centers and all of them are part of the same cluster.

There are some important things worth mentioning when configuring a cluster: the cluster name, the partitioner and the snitch. All these need to be the same for all nodes. Also, in a cluster, you will define seed nodes, but these don't need to be the same for all nodes, though is a good idea to be the same.

Upvotes: 0

Related Questions