emilly
emilly

Reputation: 10540

Replication without partitioning in Cassandra

In Mongo we can go for any of the below model

  1. Simple replication(without shard where one node will be working as master and other as slaves) or
  2. Shard(where data will be distributed on different shard based on partition key)
  3. Both 1 and 2

My question - Can't we have Cassandra just with replication without partitioning just like model_1 in mongo ?

From Cassandra vs MongoDB in respect of Secondary Index?

In case of Cassandra, the data is distributed into multiple nodes based on the partition key.

From above it looks like it is mandatory to distribute the data based on some p[artition key when we have more than one node ?

Upvotes: 0

Views: 181

Answers (2)

skomp
skomp

Reputation: 459

Basically your intuition is right: The data is always distributed based on the partition key. The partition key is also called row key or primary key, so you can see: you have one anyway. The 1. case of your mongo example is not doable in cassandra, mainly because cassandra does not know the concept of masters and slaves. If you have a 2 node cluster and a replication factor of 2, then the data will be held on 2 nodes, like Alex Ott already pointed out. When you query (read or write), your client will decide to which to connect and perform the operation. To my knowledge, the default here would be a round robin load balancing between the two nodes, so either of them will receive somewhat the same load. If you have 3 nodes and a replication factor of 2, it becomes a little more tricky. The nice part is though, that you can determine the set of nodes which hold your data in the client code, thus you don't lose any performance by connecting to a "wrong" node.

One more thing about partitions: you can configure some of this, but this would be per server and not per table. I've never used this, and personally i wouldn't recommend to do so. Just stick to the default mechanism of cassandra.

And one word about the secondary index thing. Use materialized views

Upvotes: 0

Alex Ott
Alex Ott

Reputation: 87329

In Cassandra, replication factor defines how many copies of data you have. Partition key is responsible for distributing of data between nodes. But this distribution may depend on the amount of nodes that you have. For example, if you have 3 nodes cluster & replication factor equal to 3, then all nodes will get data anyway...

Upvotes: 1

Related Questions