Reputation: 27
I have a table where I would like to identify duplicate records based on two columns(id and role) and I use a third column (unit) to select a subset of records to analyze and do the deletion within. Here comes the table and a few rows a example data:
id | role | unit
----------------
946| 1001 | 1
946| 1002 | 1
946| 1003 | 1
946| 1001 | 2
946| 1002 | 2
900| 1001 | 3
900| 1002 | 3
900| 1001 | 3
An analysis of unit 1 and 2 should identify two rows to delete 946/1001 and 946/1002. It doesn't matter if I delete the rows labeled unit 1 or 2. In a subsequent step I will update all records labeled unit=2 to unit=1.
I have a select statement capable to identify the rows to delete:
SELECT * FROM (SELECT
unit,
id,
role,
ROW_NUMBER() OVER (
PARTITION BY
id,
role
ORDER BY
id,
role
) row_num
FROM thetable WHERE unit IN (1,2) ) as x
WHERE row_num > 1;
This query will give this result:
id | role | unit
----------------
946| 1001 | 2
946| 1002 | 2
Now I would like to combine this with DELETE to delete the identified records. I have come pretty close (I believe) with this statement:
DELETE FROM thetable tp1 WHERE EXISTS
(SELECT
unit,
id,
role,
ROW_NUMBER() OVER (
PARTITION BY
id,
role
ORDER BY
id,
role
) as row_num
FROM
thetable tp2
WHERE unit IN (1,2) AND
tp1.unit=tp2.unit AND
tp1.role=tp2.role AND
tp1.id=tp2.id AND row_num >1
)
However, the row_num is not recognized as column. So how should I modify this statement to delete the two identified records?
Upvotes: 2
Views: 54
Reputation: 164069
It is very simple with EXISTS
:
DELETE FROM thetable t
WHERE t.unit IN (1,2)
AND EXISTS (
SELECT 1 FROM thetable
WHERE (id, role) = (t.id, t.role) AND unit < t.unit
)
See the demo.
Results:
> id | role | unit
> --: | ---: | ---:
> 946 | 1001 | 1
> 946 | 1002 | 1
> 946 | 1003 | 1
> 900 | 1001 | 3
> 900 | 1002 | 3
> 900 | 1001 | 3
Upvotes: 2
Reputation: 222412
You could phrase this as:
delete from thetable t
where t.unit > (
select min(t1.unit)
from thetable t1
where t1.id = t.id and t1.role = t.role
)
This seems like a simple way to solve the assignment, basically phrasing as: delete rows for which another row exists with a smaller unit
and the same id
and role
.
As for the query you wanted to write, using row_number()
, I think that would be:
delete from thetable t
using (
select t.*, row_number() over(partition by id, role order by unit) rn
from mytable t
) t1
where t1.id = t.id and t1.role = t.role and t1.unit = t.unit and t1.rn > 1
Upvotes: 1