dicaprio
dicaprio

Reputation: 733

Oracle How delete duplicates based on multiples database columns and conditions

I have a very huge table with data collected from multiple systems. I now have to delete duplicate records based on conditions and multiple columns.

Here is an example:

 +---------------+-------------+------+-----+--------
| System ID     | Debt Num    | Exp Dt | Account NO |
+---------------+-------------+------+-----+---------
| pay           | 2222        | 0114   |   111      |
| pay           | 2222        | 0214   |   111      |
| Online        | 2222        | 0214   |   111      |
| Online        | 3333        | 0115   |   222      |
| Online        | 3333        | 0116   |   222      |
| ERP           | 2222        | 0214   |   111      | 
| ERP           | 4444        | 0114   |   333      | 
+---------------+-------------+------+-----+--------

From the above data , delete the duplicates with the following conditions satisfied.

I have tried multiple queries found online , like group by, row_number over but all fulfill just one condition.

Appreciate your help to contribute your thoughts and suggestions.

EDIT: Gordon's query works fine and fits my requirements, however when I ran the same on staging which contains 540K rows, it error out ORA-00600 Internal error.

Upvotes: 0

Views: 1191

Answers (1)

Gordon Linoff
Gordon Linoff

Reputation: 1271051

I think you can do this with rowid and a correlated subquery:

delete from payinfo_staging_db
   where rowid <> (select rowid
                   from (select rowid
                         from payinfo_staging_db t2
                         where t2.debitNum = payinfo_staging_db.debitNum and
                               t2.accountNo = payinfo_staging_db.accountNo
                         order by t2.exp_dt,
                                  (case when t2.SystemId = 'Pay' then 1
                                        when t2.SystemId = 'Online' then 2
                                        when t2.SystemId = 'ERP' then 3
                                   end)
                        ) r
                    where rownum = 1
                  );

EDIT:

There must be a problem with nested references in Oracle. The following works (at least in the sense that it parses and executes correctly):

delete from payinfo_staging_db
   where rowid <> (select min(rowid) keep (dense_rank first order by exp_dt desc,
                                                            (case when t2.SystemId = 'Pay' then 1
                                                                  when t2.SystemId = 'Online' then 2
                                                                  when t2.SystemId = 'ERP' then 3
                                                             end)
                                          ) as therowid
                   from payinfo_staging_db t2
                   where t2.debitNum = payinfo_staging_db.debitNum and
                         t2.accountNo = payinfo_staging_db.accountNo
                  );

The SQL Fiddle is here.

Upvotes: 1

Related Questions