Reputation: 394
I am planning to ingest scientific measurement data into my 6-node cassandra cluster using python script.
I have checked various posts and articles on bulk loading data into cassandra. But unfortunately, none of the state-of-the.-art as discussed fits my use-case [1][2]. However, I found this post on Stack Overflow which seemed quite helpful.
Considering that post and my billion of records data, I would like to know if the combination of using PreparedStatement
(instead of Simple Statement) and execute_async
is a good practice.
Upvotes: 0
Views: 332
Reputation: 87234
Yes, that should work - but you need to have some throttle on the number of async requests that are running simultaneously. Driver allows only some number of in-flight requests, and if you submit more than allowed, then it will fail.
Another thing to think about - if you can organize data into small batches (UNLOGGED
) where all entries have the same partition could also improve situation. See documentation for examples of good & bad practices of using batches.
Upvotes: 2