Mark McGown
Mark McGown

Reputation: 1115

How to keep only unique rows in Snowflake?

I'm currently loading data from an API into Snowflake but can't find a good method of keeping only unique rows in a table. There is no guaranteed unique key from this API.

I've explored:

  1. One option I researched was to use the not matched clause, but it appears that's only when there's a key and not for the entire row at least as far as I can find. This method would've used a temp table to do the comparison.

  2. Another option would be to select distinct from the source table to the temp table and then load it back into the source. I don't like this method, albeit simple, due to there being a moment when the source table would be without data.

  3. I've also tried a few tests to make a hash and thereby create my own key representation of the row contents as an additional field before I load. This would probably be the more inefficient method and seems the most nonstandard. This is possible but as a last resort if I can't use the existing infrastructure I'm hoping has a solution.

If anyone has any working solution on any of these three path's with sample query I'd greatly appreciate it.

Pseudo-code for what I'm looking for

sql = "MERGE INTO INVOICES USING INVOICES_TEMP WHEN NOT MATCHED"
cur.execute(sql)

Edit: This way results in no errors but I can execute it over and over and it keeps adding the same rows to TABLE and doesn't block the duplicates like I want. So perhaps I have the syntax down but my usage of the table/fields is incorrect.

MERGE INTO TABLE USING TABLE_TEMP ON TABLE.FIELD1=TABLE_TEMP.FIELD1
 AND TABLE.FIELD2=TABLE_TEMP.FIELD2 AND TABLE.FIELD3=TABLE_TEMP.FIELD3
 WHEN NOT MATCHED THEN INSERT (FIELD1,FIELD2,FIELD3) VALUES
 (TABLE_TEMP.FIELD1,TABLE_TEMP.FIELD2,TABLE_TEMP.FIELD3)

Upvotes: 1

Views: 1125

Answers (1)

Marcel
Marcel

Reputation: 2622

If I understood your problem correct, I'd prefer your first option. You can extend the column-comparison within the MERGE-statement to all columns so you are not limited to the keys only.

merge into target_table using source_table 
    on target_table.col1 = source_table.col1
    and target_table.col2 = source_table.col2
    and target_table.col3 = source_table.col3
    ...

More infos about merge: https://docs.snowflake.com/en/sql-reference/sql/merge.html

Upvotes: 1

Related Questions