user7982987
user7982987

Reputation:

why spark internally uses batch writes to Cassandra

I am new to spark , i am trying to understand Why does spark writes in batches to Cassandra (eg: savetocassandra operation) , when batches are not so efficient for all uses cases. What should be really taken care off from cassandra side or spark side , when we are doing spark job which reads from cassandra and writes back to cassandra , apart from optimizing the spark.cassandra properties.

Is it logged batched write or unlogged batch write ?

Upvotes: 0

Views: 532

Answers (2)

Artem Aliev
Artem Aliev

Reputation: 1407

Here is the the great explanation: Maximum Overdrive: Tuning the Spark Cassandra Connector (Russell Spitzer, DataStax) | C* Summit 2016 https://www.youtube.com/watch?v=cKIHRD6kUOc

Upvotes: 2

KrazyGautam
KrazyGautam

Reputation: 2692

This is not very specific to Spark to Cassandra , but any process writing to service

  1. Spark writes to cassandra via API and not as file
  2. Batch always speed up the puts as in one API call you batch multiple rows to be put.
  3. Batching leads to difficult handling of exactly one semantics .
  4. You can always write your own Spark task to do one put at a time.
  5. I think single vs batch should be configurable

Upvotes: 1

Related Questions