Reputation: 1
Is there a way to configure a Cassandra cluster with data centre splitting / NetworkTopologyStrategy / ReplicationFactor 1? Basically, I want to keep the data in its originating node but still be able to query it all from any node. The business use case is:
I have a group of customers, each is a different firm with data in their own datacentres. I want to do some cross-firm data analysis without useable data leaving their premises i.e. I can't get them all to load their data onto a central server. I am looking for a platform that allows me to deploy software to each firm such that I can do distributed comparisons of their data without them having to send me their data in bulk (much of it is prohibited for distribution). Data transferred in a non-readable wire format as part of a distributed "join" will be fine as long as I'm not replicating the data to the other customers data centres.
Upvotes: 0
Views: 238
Reputation: 48692
Yes, you can have a replication factor of 1. However, ensuring that each item of data is on the node at a particular site requires additional work. You will need to have a customer ID as the partition key for every table, and write a custom partitioner that maps customer ID to a token for that customer. And you will have to manually configure each node to use only the one token for its customer.
Upvotes: 0