Py1996
Py1996

Reputation: 239

how to drop duplicates based on specific column in snowflake

I have table called "Mytable" as below(just an example), it has two columns Hash and last_checked.

Hash    last_checked
1       2021-04-01T12:14:00+00:00
1       2021-04-02T12:14:00+00:00
1       2021-04-03T12:14:00+00:00
2       2021-04-01T12:14:00+00:00
2       2021-04-02T12:14:00+00:00
2       2021-04-03T12:14:00+00:00

I want to delete duplicate hash values based on the last_checked column/field. I need to retain latest last_checked date values. output table as below

Hash    last_checked
1       2021-04-03T12:14:00+00:00
2       2021-04-03T12:14:00+00:00

I didnot find answer anywhere, so hoping I may get answer here.Thank you

Upvotes: 0

Views: 414

Answers (1)

akshindesnowflake
akshindesnowflake

Reputation: 601

Assuming you have HASHTABLE AS TABLE With Columns Hash & last_checked

Using Subquery from this HASHTABLE after USING Predicate , Also Used Qualify Predicate to filter on Windows Functions.

DELETE FROM  HASHTABLE   USING 
(

 SELECT HASHVAL,last_checked FROM HASHTABLE Qualify  ROW_NUMBER()  OVER (PARTITION BY HASHVAL ORDER BY last_checked  DESC)  !=1   

) AS

HASHTABLE_DERIVED   

WHERE  HASHTABLE.HASHVAL=HASHTABLE_DERIVED.HASHVAL AND HASHTABLE.last_checked=HASHTABLE_DERIVED.last_checked

Upvotes: 1

Related Questions