Reputation: 361
I get csv files, read these files and write them to Cassandra. I do this for alot of data (roughly 10 million lines per day) The files itself are fairly small (from 100 to 1000 lines)
What I want to do is checking before i write them to the database, if the primary key I'm about to insert, already exists.
I know I can do it with Select count(*) from table where primary key1 = something and key2 is something else
.
But this is slow, I want to check for an entire file in 1 go if its going to effect data that is already in Cassandra, and I want(need) it to be fast. Is there a way to achieve what I want? (or something similar, like checking per batch if its going to affect rows)
Upvotes: 0
Views: 1577
Reputation: 2321
You can use IF NOT EXIST in INSERT Statements and IF EXIST in UPDATE Statements. The performance is better than counting all rows but, in compare to insert, without checking, slow. Cassandra has to check all nodes for existing primary keys.
Documentation for INSERT: https://docs.datastax.com/en/cql/3.1/cql/cql_reference/insert_r.html
and for UPDATE: https://docs.datastax.com/en/cql/3.1/cql/cql_reference/update_r.html
Upvotes: 1