Reputation: 522
Hope this is the right place to ask!
I am trying to set up a cluster with spark, cassandra and one more external tool. So, the external tool is executed in parallel across the cluster with the help of spark(pipe command) and this tool has the ability to store straight into cassandra database(see picture below) through a simple sql Insert command. This means that in every node the results are send from the external tool of the node straight to the cassandra of the node.
My wild guess/doubt/question is that each of these nodes will act as coordinator node and will be responsible for destributing/sending the data to other nodes according to the primary/partition key at the same time. Is that right? If not...what will happen?
Upvotes: 0
Views: 775
Reputation: 2996
Each of the Cassandra nodes can act as a coordinator. If your tool is correctly configured to use TokenAwarePolicy, your tool should choose for each request a replica as a coordinator, avoiding a few extra network hop. If you insert in Batch, try to batch together data with the same partition key.
Note that in your diagram, you will get better performance you write directly from Spark to Cassandra. You can use the spark-cassandra-connector for that.
Upvotes: 1