Reputation: 39951
In MySQL-land it's common to set up a read replica for reporting, business intelligence, data mining and other heavy workloads.
What is the equivalent in the world of Cassandra?
I've seen solutions where an additional datacenter is introduced but I feel that dirties the production environment. Queries on both the reporting datacenter or the normal datacenter could, by mistake or by design, run queries with consistency level ALL.
I've also seen solutions where you just run all kinds of queries against the normal cluster, including all heavy reporting. While I think this could be a nice solution I'm not sure how to handle the load. BI typically runs tens of thousands times the normal, customer driven, queries.
So, if any one have had to do something like this I would love to hear the solutions and arguments.
Upvotes: 4
Views: 1431
Reputation: 11638
I've seen solutions where an additional datacenter is introduced but I feel that dirties the production environment. Queries on both the reporting datacenter or the normal datacenter could, by mistake or by design, run queries with consistency level ALL.
I think in the general case users are able to accept this risk knowing that they have control over their clients. However, if you are concerned about this, you could look at a solution like DataStax Enterprise's Advanced Replication feature which allows you to replicate data unidirectionally to a remote cluster that is not in the same ring.
I've also seen solutions where you just run all kinds of queries against the normal cluster, including all heavy reporting. While I think this could be a nice solution I'm not sure how to handle the load. BI typically runs tens of thousands times the normal, customer driven, queries.
That is true and this is usually the motivation for having a separate data center so intensive BI workloads do not impact nodes in the primary data center.
Upvotes: 2