Reputation: 57
I need some help with a more complex query. So let's say we have a table tags
which contains id
, value
, category
and we have a lot of duplicates in the value
column like aaa
, AaA
, etc.
How would a query look like that deletes all duplicates except one, so let's say the end result will be aaa
, bbb
, etc. Also each record can have a foreign_key in other tables, because tags are used in other entities.
I am really stuck at this point. The flow would be in my mind:
Upvotes: 1
Views: 1116
Reputation: 23676
The row_number()
window function adds a row count to each record of a group. So, your group could be category and in there all values which equals, after they were converted to lowerCase (lower()
). The duplicates now are all values with a row count number >= 2.
SELECT
id
FROM (
SELECT
*,
row_number() OVER (PARTITION BY category, lower(values)) as row_count
FROM t
)s
WHERE row_count >= 2
Use the query above in a DELETE
statement
DELETE FROM t
WHERE id IN (
SELECT
id
FROM (
SELECT
*,
row_number() OVER (PARTITION BY category, lower(values)) as row_count
FROM t
)s
WHERE row_count >= 2
);
The tricky part is the FOREIGN KEY problem. Without knowing your actual table structure it is not possible to tell how to solve it. If it is designed well, your FK contraint contains an ON UPDATE OR DELETE
part, which leads to automatically removing the references dat.
Upvotes: 2