Ravinder Payal
Ravinder Payal

Reputation: 3031

Batch Insert in Lagom-Scala Cassandra readside

I searched on google but didn't got any links other than Cassandra read-side documentation page. So, I just want to ask if there's any API or function already included in Akka-Cassandra package for batch row inserting or I have to call the insert code multiple times for multiple row insertion.

Note:- I am not asking about inserting multiple events, I just want to store some json data in Key-Pair format. So single event containing Json object might need multiple rows. In PHP and other languages we can supply a Array having multiple rows, but how does Akka's Cassandra driver implementation offer this?

Upvotes: 0

Views: 462

Answers (2)

pazustep
pazustep

Reputation: 441

CassandraSession exposes everything you need for batch writes, namely CassandraSession#prepare followed by CassandraSession#executeWriteBatch.

Something like this:

PreparedStatement ps = session.prepare(...);
BatchStatement batch = new BatchStatement();
batch.add(ps.bind(...));
batch.add(ps.bind(...));
session.executeWriteBatch(batch);

That said, notice that read side handlers built using CassandraReadSide need to return a List<BoundStatement> from the event handler methods. Lagom will automatically execute these statements in a batch.

Upvotes: 1

ignasi35
ignasi35

Reputation: 925

Lagom's Read Side processes events one at a time. The only scenario where a batch insert would be possible is if you keep events in memory and persist the batch past a timeout or when the set is big enough. This approach is prone to data loss (or at-most-once semantics) because in case of crash the event stream will consider the event consumed but the data in memory will not be persisted.

Lagom defaults make each event processing a single transaction that includes user-provided code to update the read-side tables and the offset store within lagom. This approach allows for effectively-once read-side processing when all the operations provided by the user happen within the transaction.

The suggested approach, at the moment, is to shard your persistent entity tag so that your persistent entity eventstream can be consumed from many read side processor instances in parallel. With that solution, each instance will process the events one at a time but many instances will be distributed across your cluster.

Upvotes: 0

Related Questions