Reputation: 23
I'am new in Cassandra. I have studied and performed some tests on Cassandra database and I got some questions:
Given that Cassandra encourage denormalization and duplication of data, when data that are present in multiple column families are updated from just one of the column families how data consistency is guaranteed?
The number of columns in a table affects query performance?
It's true that the greater the number of records returned by a query, its performance is worse?
What kind of circumstances is useful to use mapreduce in Cassandra?
Upvotes: 2
Views: 182
Reputation: 185
Upvotes: 0
Reputation: 1653
Given that Cassandra encourage denormalization and duplication of
data, when data that are present in multiple column families are
updated from just one of the column families how data consistency is guaranteed?
This was the very reason BATCH was introduced in Cassandra. Even with BATCH, you're still in a distributed system and need to think as such when modeling your data. Since you don't have a specific problem, we'll keep talking theoretically.
The number of columns in a table affects query performance?
It's not so much the number of columns as it is the size of each individual partition. The larger the partition, the harder some of Cassandra's internal mechanisms (such as compaction) has to work. If you are not familiar with how data is stored on disk, I suggest taking a look at THIS tutorial.
It's true that the greater the number of records returned by a query, its performance is worse?
It's physics. More data = more IO, bandwidth, objects for GC to collect ETC. Given Cassandra is built as a transactional datastore, it's not build for extremely large data returns/full table scans (very few truly distributed systems are). The tutorial linked above does a good job of explaining.
What kind of circumstances is useful to use mapreduce in Cassandra?
If you're interested in running analytics on Cassandra, I suggest going the route of using Spark as there has been a lot of work to optimize the relationship of Spark and Cassandra both at the commercial and open source level. When you're comfortable with how Cassandra works, I suggest taking a look at THIS tutorial if you're interested in doing any sort of analytics on Cassandra. It talks to the commercial offering, but the concepts/tutorials will also apply to the open source as well.
Upvotes: 2