Cassandra PreparedStatement usage with ExecuteAsync can help improve bulk insertion?

Question

I am planning to ingest scientific measurement data into my 6-node cassandra cluster using python script.

I have checked various posts and articles on bulk loading data into cassandra. But unfortunately, none of the state-of-the.-art as discussed fits my use-case [1][2]. However, I found this post on Stack Overflow which seemed quite helpful.

Considering that post and my billion of records data, I would like to know if the combination of using PreparedStatement (instead of Simple Statement) and execute_async is a good practice.

Alex Ott · Accepted Answer

Yes, that should work - but you need to have some throttle on the number of async requests that are running simultaneously. Driver allows only some number of in-flight requests, and if you submit more than allowed, then it will fail.

Another thing to think about - if you can organize data into small batches (UNLOGGED) where all entries have the same partition could also improve situation. See documentation for examples of good & bad practices of using batches.

Cassandra PreparedStatement usage with ExecuteAsync can help improve bulk insertion?

Answers (1)

Related Questions