overexchange
overexchange

Reputation: 1

Perform multiple inserts per POST request

We have a scenario, where each insert happen per id_2 given id_1, for below schema, in Cassandra:

CREATE      TABLE  IF    NOT    EXISTS my_table (
  id_1                   UUID,
  id_2                   UUID,
  textDetails            TEXT,
  PRIMARY KEY (id_1, id_2)
);

A single POST request body has the details for multiple values of id_2. This triggers multiple inserts per single POST request on single table.

Each INSERT query is performed as shown below:

insertQueryString = "INSERT INTO my_table (id_1, id_2, textDetails) " + "VALUES (?, ?, ?) IF NOT EXISTS"
    cassandra.Session.Query(insertQueryString, 
    id1,                            
    id2,        
    myTextDetails).Exec();

1

Does Cassandra ensure data consistency on multiple inserts on a single table, per POST request? Each POST request is processed on a Go-routine(thread). Subsequent GET requests should ensure retrieving consistent data(inserted through POST)

Using BATCH statements is having "Batch too large" issues in staging & production. https://github.com/RBMHTechnology/eventuate/issues/166

2

We have two data centres(for Cassandra), with 3 replica nodes per data center.

What are the consistency levels need to set for write query operation(POST request) and ready query operation(GET request), to ensure full consistency

Upvotes: 0

Views: 136

Answers (1)

Alex Ott
Alex Ott

Reputation: 87244

There are multiple problems here:

  • Batching should be used very carefully in Cassandra - only if you're inserting data into the same partition. If you insert data into multiple partitions, then it's better to use separate queries executed in parallel (but you can collect multiple entries per partition key and batch them).
  • you're using IF NOT EXISTS and it's done against the same partition - as result it leads to the conflicts between multiple nodes (see documentation on lightweight transactions) plus it requires reading data from disk, so it heavily increase the load onto the nodes. But do you really need to insert data only if the row doesn't exist? What is the problem if row exists already? It's easier just to overwrite data in Cassandra when doing INSERT because it won't require reading data from the disk.

Regarding consistency level - the QUORUM (or SERIAL for LWTs) will give you the strong consistency but at expense of the increased latency (because you need to wait for answer from another DC), and lack of fault tolerance - if you lose another DC, then all your queries will fail. In most cases the LOCAL_QUORUM is enough (LOCAL_SERIAL in case of LWTs), and it will provide fault tolerance. I recommend to read this whitepaper on best practices of build fault-tolerance applications on top of Cassandra.

Upvotes: 2

Related Questions