Robin
Robin

Reputation: 1

Cassandra without replication

Is there a way to configure a Cassandra cluster with data centre splitting / NetworkTopologyStrategy / ReplicationFactor 1? Basically, I want to keep the data in its originating node but still be able to query it all from any node. The business use case is:

I have a group of customers, each is a different firm with data in their own datacentres. I want to do some cross-firm data analysis without useable data leaving their premises i.e. I can't get them all to load their data onto a central server. I am looking for a platform that allows me to deploy software to each firm such that I can do distributed comparisons of their data without them having to send me their data in bulk (much of it is prohibited for distribution). Data transferred in a non-readable wire format as part of a distributed "join" will be fine as long as I'm not replicating the data to the other customers data centres.

Upvotes: 0

Views: 238

Answers (1)

Raedwald
Raedwald

Reputation: 48692

Yes, you can have a replication factor of 1. However, ensuring that each item of data is on the node at a particular site requires additional work. You will need to have a customer ID as the partition key for every table, and write a custom partitioner that maps customer ID to a token for that customer. And you will have to manually configure each node to use only the one token for its customer.

Upvotes: 0

Related Questions