Reputation: 11
Criteria : 1) unique combination of 2 columns(column1,column2)
2) keep oldest one out of that combination
3) records might be same i.e. same column1, column2 and creation date in that case need the one which has lesser id.
e.g. data is as below:
ID column1 column2 creation_date(dd-mm-yyyy)
1 11 aa 10/5/2016
2 11 aa 11/6/2016
3 12 bb 10/5/2017
4 12 bb 20-05-2017
5 12 cc 10/5/2016
6 12 cc 11/5/2017
7 13 dd 10/1/2018
8 13 dd 10/1/2018
I need to keep records with id: 1,3,5,7
Approach I am thinking of is:
a) first write select query to get required records (in this example 1,3,5,7)
b) write update query to change status to deleted using update query(soft delete)
Also please suggest if any other better approach to fulfill the criteria.
Additional information: *total number of records: 11k
*I don't want to get records directly from table rather than that I have a query which fetches only required data, need to run query on those records
*Final aim is to modify status of duplicate records to deleted and append deleted word to those records
Upvotes: 0
Views: 72
Reputation: 11
So final queries which worked for my question are as below:
1) to get count of records/ to get required columns:
SELECT --count (*) -use this to get count of records
ID, COLUMN1, COLUMN2,CREATION_DATE --required columns
FROM
MY_TABLE
WHERE
ROWID IN(
select duplicate_rowid
from (select rowid as duplicate_rowid
,row_number() over(
partition by COLUMN1, COLUMN2 -- criterion 1
ORDER BY CREATION_DATE ASC -- criterion 2
,ID ASC -- criterion 3
) AS RNK
from MY_TABLE
)
WHERE (RNK > 1 and COLUMN1 IS NOT NULL and COLUMN2 IS NOT NULL)
);
2) to update records with status=deleted and append _deleted word to column1 values:
UPDATE MY_TABLE
SET STATUS='deleted' , COLUMN1=CONCAT(COLUMN1,'_deleted')
WHERE
ROWID IN(
select duplicate_rowid
from (select rowid as duplicate_rowid
,row_number() over(
partition by COLUMN1, COLUMN2 -- criterion 1
ORDER BY CREATION_DATE ASC -- criterion 2
,ID ASC -- criterion 3
) AS RNK
from MY_TABLE
)
WHERE (RNK > 1 and COLUMN1 IS NOT NULL and COLUMN2 IS NOT NULL)
);
Upvotes: 0
Reputation: 12833
This is really straight forward if you use analytic functions. The query has three parts:
A) Assign a rank to each record like this: Group records by column1 and column2. Within each group, sort the records first by creation_date and then by ID. Assign 1 to the first record, 2 to the second and so on.
B) Keep only the duplicates, i.e. the records with newer creation_date and/or ID. The record with rnk = 1 would be your requested record. Records with rnk > 1 are the duplicates.
C) Using ROWID, delete the duplicates
delete
from your_table
where rowid in(-- (C)
select duplicate_rowid
from (select rowid as duplicate_rowid
,row_number() over( -- (A)
partition by column1, column2 -- Your criterion 1
order by creation_date asc -- Your criterion 2
,id asc -- Your criterion 3
) as rnk
from your_table
)
where rnk > 1 -- (B)
);
Upvotes: 0