Check if data already exists on a lot of data

Question

I get csv files, read these files and write them to Cassandra. I do this for alot of data (roughly 10 million lines per day) The files itself are fairly small (from 100 to 1000 lines)

What I want to do is checking before i write them to the database, if the primary key I'm about to insert, already exists.

I know I can do it with Select count(*) from table where primary key1 = something and key2 is something else.

But this is slow, I want to check for an entire file in 1 go if its going to effect data that is already in Cassandra, and I want(need) it to be fast. Is there a way to achieve what I want? (or something similar, like checking per batch if its going to affect rows)

Check if data already exists on a lot of data

Answers (1)

Related Questions