davidjwest
davidjwest

Reputation: 546

Remove Duplicates But Leave One Row For Each

I have the following table:

ONBackup Table:

Contract    FromDate    Invoice     Data
232         12/12/2017  123
232         14/02/2018  123
232         15/07/2018  123
232         14/02/2017  676
311         12/12/2017  881

There are lots of "duplicate" rows, a duplicate to me is where the Invoice numbers are the same, ie the other fields can be different.

The table has 1.4 million rows (and around a million duplicates), so not sure if the below would work or not as I am getting bored waiting for 3 hours and counting, it must be more CPU hungry than I though.

DELETE FROM ONBackup
WHERE Invoice NOT IN
(
    SELECT MIN(Invoice)
    FROM ONBackup
    GROUP BY Invoice
)

Is there a quicker way of doing this that will work?

Upvotes: 0

Views: 57

Answers (3)

Yogesh Sharma
Yogesh Sharma

Reputation: 50163

Use row_number() function :

delete b
from (select b.*, row_number() over (partition by b.invoice order by b.fromdate desc) as seq 
      from ONBackup b
     ) b
where seq > 1;

This leaves latest fromdate for each invoices.

Upvotes: 5

Chanukya
Chanukya

Reputation: 5893

  DELETE A
FROM
(
  select  *,row_number() over (partition by invoice order by invoice)as rn from 
  table1
) A
WHERE A.rn > 1

Upvotes: 3

SQL_M
SQL_M

Reputation: 2475

I think a CTE is a good option here: (note that you must end the previous statement with a semi-colon).

WITH CTE AS 

(
  SELECT Invoice, ROW_NUMBER() OVER (PARTITION BY INVOICE ORDER BY SELECT '1') AS RowNumb
  FROM ONBackup
)

DELETE FROM CTE WHERE RowNumb > 1

Upvotes: 4

Related Questions