We have an application which uses Cassandra as data store. For easy access, same data need to be stored in multiple tables with different partition keys. For storing data into multiple tables BatchStatements are used. Reason for using batch statement is to make sure the data is written to all or none. With this set up, recently we started seeing lot of write timeout errors due to increase in user base. We came across many blogs and articles which mention that the BatchStatements are mistakenly used for storing multiple partition. References: https://docs.datastax.com/en/dse/6.0/cql/cql/cql_using/useBatchGoodExample.html What is the batch limit in Cassandra? Cassandra Batch statement-Multiple tables https://grokbase.com/t/cassandra/user/153gsmdzs6/writing-to-multiple-tables The reason for this seems to be large load on co-ordinator nodes and in turn causing latencies. There was an option of increasing write_request_timeout_in_ms in cassandra.yaml to a higher value than default 5 s. We attempted this, but still requests failed. Hence we updated this set up to now use executeAsync. With this, WriteTimeout Exceptions went away completely. But now the question is - how do we handle atomicity? Below is the code updated to use executeAsync. Is use of executeAsync a right alternative to using batch statements? Is there any way rollbacks can be handled in the exception block? try { for (ListenableFuture<ResultSet> futureItem : futureItems) { futureItem.get(); } } catch (Exception e) { // need to handle rollback ? }

Reputation: 2301

Alternative to Batch Statements in Cassandra for atomic operation to avoid performance impact

We have an application which uses Cassandra as data store. For easy access, same data need to be stored in multiple tables with different partition keys. For storing data into multiple tables BatchStatements are used. Reason for using batch statement is to make sure the data is written to all or none.

With this set up, recently we started seeing lot of write timeout errors due to increase in user base. We came across many blogs and articles which mention that the BatchStatements are mistakenly used for storing multiple partition.

References:

The reason for this seems to be large load on co-ordinator nodes and in turn causing latencies. There was an option of increasing write_request_timeout_in_ms in cassandra.yaml to a higher value than default 5 s. We attempted this, but still requests failed. Hence we updated this set up to now use executeAsync. With this, WriteTimeout Exceptions went away completely.

But now the question is - how do we handle atomicity? Below is the code updated to use executeAsync. Is use of executeAsync a right alternative to using batch statements? Is there any way rollbacks can be handled in the exception block?

try {
    for (ListenableFuture<ResultSet> futureItem : futureItems) {
        futureItem.get();
    }
} catch (Exception e) {
    // need to handle rollback ?
}

Upvotes: 5

Answers (2)

Siddharth Mittal

Reputation: 96

NoSQL databases specially made for high availability and partition tolerance (AP of CAP) are not made to provide high referential integrity. Rather they are designed to provide high throughput and low latency reads and writes. Cassandra itself has no concept of referential integrity across tables.

Batch Inserts and LWT are good until they are used not used in scale. For your use case you need to revisit how you are going to use Cassandra and how you can design your data processing pipelines to give resilient writes to all the tables.

Think about decoupling all these table writes and make them parallel resilient pipelines using something like kafka and then persisting data to Cassandra tables. You can create exactly once data pipelines and hence ensure referential data integrity. Cassandra do support Kafka Connector

https://www.datastax.com/blog/2018/12/introducing-datastax-apache-kafkatm-connector

Upvotes: 1

Chris Gerlt

Reputation: 767

Ultimately, what you are asking for doesn't exist - by design.

For atomicity of writes, you found a solution with batch. For alternative atomicity of writes, ultimately, there is none.
For Hard Consistency of data - which includes writing and reading, you can set your consistency levels of write and read to ensure hard consistency (WC: Local_Quorum, RC: Local_Quorum)

Many newer users / dev teams often try to force relational type rules on Cassandra, but after time their use of Cassandra usually brings faith in its design allowing for tunable consistency, reduced downtime, and scalability.

Upvotes: 1

Alternative to Batch Statements in Cassandra for atomic operation to avoid performance impact

Answers (2)

Related Questions