Reputation: 1780
I want to automate the ingestion of data from a source into a SnowFlake Cloud Database. There is no way to extract only unique rows from the source. So the entire data will be extracted during every ingestion run. However, while adding to SnowFlake I only want to add the unique rows. How can this be achieved most optimally?
Further Information: Source is a DataStax Cassandra Graph.
Upvotes: 0
Views: 849
Reputation: 31
You will have to stage it to a table in snowflake for ingestion and then move it to the destination table using select distinct.
Upvotes: 1
Reputation: 7369
Assuming there is a key that you can use to determine which records need to be loaded, the idea scenario would be to load the data to a stage table in Snowflake and then run a MERGE statement using the new data and apply to your target table.
https://docs.snowflake.com/en/sql-reference/sql/merge.html
If there is no key, you might want to consider running an INSERT OVERWRITE statement and just replacing the table with the new incoming data.
https://docs.snowflake.com/en/sql-reference/sql/insert.html#insert-using-overwrite
Upvotes: 1