I have a performance question. I got two tables in Cassandra, both of them have exactly same structure. I need to save incoming data in both of them. The problem I have is what would be better solution to do this: Create two repositories, both of them open Cassandra session, save data to both tables separately (all in code). Save data to one table, have a trigger on this table and copy incoming data to another one Any other solution? I think first two are ok, but I am not sure if first one is good enough. Can someone explain it to me?

Reputation: 1462

Which solution is preferred when populating two Cassandra tables?

I have a performance question. I got two tables in Cassandra, both of them have exactly same structure. I need to save incoming data in both of them. The problem I have is what would be better solution to do this:

Create two repositories, both of them open Cassandra session, save data to both tables separately (all in code).
Save data to one table, have a trigger on this table and copy incoming data to another one
Any other solution?

I think first two are ok, but I am not sure if first one is good enough. Can someone explain it to me?

Upvotes: 1

Answers (2)

Erick Ramirez

Reputation: 16393

+1 to Aaron's response about using BATCH statements but the quoted example is specific to CQL. It's a bit more nuanced when implementing it in your app.

If you're using the Java driver, a typical INSERT statement would look like:

SimpleStatement simpleInsertUser = SimpleStatement.newInstance(
  "INSERT INTO users (...) VALUES (?), "..." );

And here's a prepared statement:

PreparedStatement psInsertUserByMobile = session.prepare(
  "INSERT INTO users_by_mobile (...) VALUES (...)" );

If you were to batch these 2 statements:

BatchStatement batch = BatchStatement.newInstance(
  DefaultBatchType.LOGGED,
  simpleInsertBalance,
  preparedInsertExpense.bind(..., false) );

session.execute(batch);

For item 2 in your list, I don't know companies who use Cassandra TRIGGERs in production so it isn't something I would recommend. It was experimental for a while and I don't have enough experience to be able to recommend them for production.

For item 3, this is the use case that Materialized Views are trying to solve. They are certainly a lot simpler from a dev point-of-view since the table updates are done server-side instead of client-side.

They are OK to use if you don't have a lot of tables but be aware that the updates to the views happen asynchronously (not at the same time as when the mutations occur on the base table). With MVs, there's also a risk that when the views get so out-of-sync with the base table, the only solution is to drop and recreate the MV.

If you prefer not to use BATCH statements, just make sure you're fully aware of the tradeoffs with using MVs. If you're interested, I've explained it in a bit more detail in https://community.datastax.com/articles/2774/. Cheers!

Upvotes: 0

Aaron

Reputation: 57808

This sounds like a good use case for BATCH. Essentially, you can assemble two write statements and execute them in a BATCH to ensure atomicity. That should keep the two tables in-sync. Example below from the DataStax docs (URL).

cqlsh> BEGIN LOGGED BATCH
    INSERT INTO cycling.cyclist_names (cyclist_name, race_id) VALUES ('Vera ADRIAN', 100);
    INSERT INTO cycling.cyclist_by_id (race_id, cyclist_name) VALUES (100, 'Vera ADRIAN');
APPLY BATCH;

Upvotes: 1

Which solution is preferred when populating two Cassandra tables?

Answers (2)

Related Questions