I've a table Order i my Oracle DB. Somehow the insert job ran twice and created duplicate records with only difference in timestamp: ID NAME ITEM DATE 1 Ric POTATO 06-07-21 12:35:27.048000000 PM 2 Ric POTATO 06-07-21 12:35:27.048100000 PM 3 Ashley KALE 06-07-21 12:35:27.049000000 PM 4 Ashley KALE 06-07-21 12:35:27.049100000 PM 5 Adam WATER 06-07-21 12:36:27.050000000 PM 5 Adam WATER 06-07-21 12:36:27.050100000 PM Now I want to remove the most recent duplicate records. How to create a query to delete the records where timestamp is latest for (Name, Item) ?

sqloracle-databasegreatest-n-per-groupsql-delete

Reputation: 904

SQL - Delete most recent duplicate records

I've a table Order i my Oracle DB. Somehow the insert job ran twice and created duplicate records with only difference in timestamp:

ID	NAME	ITEM	DATE
1	Ric	POTATO	06-07-21 12:35:27.048000000 PM
2	Ric	POTATO	06-07-21 12:35:27.048100000 PM
3	Ashley	KALE	06-07-21 12:35:27.049000000 PM
4	Ashley	KALE	06-07-21 12:35:27.049100000 PM
5	Adam	WATER	06-07-21 12:36:27.050000000 PM
5	Adam	WATER	06-07-21 12:36:27.050100000 PM

Now I want to remove the most recent duplicate records. How to create a query to delete the records where timestamp is latest for (Name, Item)?

Upvotes: 0

Answers (2)

Marmite Bomber

Reputation: 21075

Provided that the column NAME, ITEM and "DATE" (you'd better not use reserved words) are unique keys you may use following query

DELETE tab
 WHERE (NAME, ITEM, "DATE") IN
       (
        WITH del AS 
        (
         SELECT ID,
                NAME,
                ITEM,
                "DATE",
                ROW_NUMBER() OVER ( PARTITION BY NAME, ITEM ORDER BY "DATE") AS rn
           FROM tab
         )  
       SELECT NAME,ITEM,"DATE" 
         FROM del 
        WHERE rn > 1
       );

This guarantees that for each combination of NAME, ITEM only the row with the first "DATE" (rn = 1) preserves and all other will be deleted.

This query works even if there are keys with triplets or rows without duplication.

The unique key on NAME, ITEM and "DATE" is important only to make the query deterministic.

Upvotes: 2

Gordon Linoff

Reputation: 1269873

You can use:

delete from orders o
    where o.date > (select min(o2.date)
                    from orders o2
                    where o2.name = o.name and o2.item = o.item
                   );

Or if you want to use not exists, you can delete rows for which there is an earlier row:

delete from orders o
    where exists (select 1
                  from orders o2
                  where o2.name = o.name and
                        o2.item = o.item and
                        o2.date < o.date
                 );

Only the "first" row doesn't have an earlier row.

Upvotes: 0

SQL - Delete most recent duplicate records

Answers (2)

Related Questions