Shitalb1
Shitalb1

Reputation: 11

Write a query to get rid of duplicate records in oracle database with below mentioned criteria:

Criteria : 1) unique combination of 2 columns(column1,column2)

2) keep oldest one out of that combination

3) records might be same i.e. same column1, column2 and creation date in that case need the one which has lesser id.

e.g. data is as below:

ID      column1     column2     creation_date(dd-mm-yyyy)       
1       11          aa          10/5/2016
2       11          aa          11/6/2016
3       12          bb          10/5/2017
4       12          bb          20-05-2017
5       12          cc          10/5/2016
6       12          cc          11/5/2017
7       13          dd          10/1/2018
8       13          dd          10/1/2018

I need to keep records with id: 1,3,5,7

Approach I am thinking of is:

a) first write select query to get required records (in this example 1,3,5,7)

b) write update query to change status to deleted using update query(soft delete)

Also please suggest if any other better approach to fulfill the criteria.

Additional information: *total number of records: 11k

*I don't want to get records directly from table rather than that I have a query which fetches only required data, need to run query on those records

*Final aim is to modify status of duplicate records to deleted and append deleted word to those records

Upvotes: 0

Views: 72

Answers (2)

Shitalb1
Shitalb1

Reputation: 11

So final queries which worked for my question are as below:

1) to get count of records/ to get required columns:

SELECT --count (*) -use this to get count of records
ID, COLUMN1, COLUMN2,CREATION_DATE --required columns
FROM
MY_TABLE
WHERE
ROWID IN( 
        select duplicate_rowid 
          from (select rowid as duplicate_rowid
                      ,row_number() over(  
                         partition by COLUMN1, COLUMN2      --  criterion 1
                             ORDER BY CREATION_DATE ASC   -- criterion 2
                                     ,ID   ASC          -- criterion 3

                      ) AS RNK
                  from MY_TABLE
                )
         WHERE (RNK > 1 and COLUMN1 IS NOT NULL and COLUMN2 IS NOT NULL)
);

2) to update records with status=deleted and append _deleted word to column1 values:

    UPDATE MY_TABLE
    SET STATUS='deleted' , COLUMN1=CONCAT(COLUMN1,'_deleted') 
    WHERE
    ROWID IN( 
                select duplicate_rowid 
                  from (select rowid as duplicate_rowid
                              ,row_number() over(  
                                 partition by COLUMN1, COLUMN2      --  criterion 1
                                     ORDER BY CREATION_DATE ASC   -- criterion 2
                                             ,ID   ASC          -- criterion 3

                              ) AS RNK
                          from MY_TABLE
                        )
                 WHERE (RNK > 1 and COLUMN1 IS NOT NULL and COLUMN2 IS NOT NULL)
);

Upvotes: 0

Ronnis
Ronnis

Reputation: 12833

This is really straight forward if you use analytic functions. The query has three parts:

A) Assign a rank to each record like this: Group records by column1 and column2. Within each group, sort the records first by creation_date and then by ID. Assign 1 to the first record, 2 to the second and so on.

B) Keep only the duplicates, i.e. the records with newer creation_date and/or ID. The record with rnk = 1 would be your requested record. Records with rnk > 1 are the duplicates.

C) Using ROWID, delete the duplicates

delete
  from your_table
 where rowid in(-- (C) 
        select duplicate_rowid
          from (select rowid as duplicate_rowid
                      ,row_number() over( -- (A) 
                         partition by column1, column2    -- Your criterion 1
                             order by creation_date asc   -- Your criterion 2
                                     ,id            asc   -- Your criterion 3
                      ) as rnk
                  from your_table
                )
         where rnk > 1  -- (B) 
       );

Upvotes: 0

Related Questions