Remove duplicates by case insensitive and remove all references in other tables

Question

I need some help with a more complex query. So let's say we have a table tags which contains id, value, category and we have a lot of duplicates in the value column like aaa, AaA, etc.

How would a query look like that deletes all duplicates except one, so let's say the end result will be aaa, bbb, etc. Also each record can have a foreign_key in other tables, because tags are used in other entities.

I am really stuck at this point. The flow would be in my mind:

Get all tags with same value (lower each one, to make them case sensitive) except one from each group
By those found records delete all records of said tag in other tables
Finally delete these found records and leave only one example of each (which would have been excluded in step 1)

S-Man · Accepted Answer

demo:db<>fiddle

Finding duplicates:

The row_number() window function adds a row count to each record of a group. So, your group could be category and in there all values which equals, after they were converted to lowerCase (lower()). The duplicates now are all values with a row count number >= 2.

SELECT 
    id
FROM (
    SELECT
        *,
        row_number() OVER (PARTITION BY category, lower(values)) as row_count
    FROM t
)s
WHERE row_count >= 2

Delete duplicates:

Use the query above in a DELETE statement

DELETE FROM t
WHERE id IN (
    SELECT 
        id
    FROM (
        SELECT
            *,
            row_number() OVER (PARTITION BY category, lower(values)) as row_count
        FROM t
    )s
    WHERE row_count >= 2
);

The tricky part is the FOREIGN KEY problem. Without knowing your actual table structure it is not possible to tell how to solve it. If it is designed well, your FK contraint contains an ON UPDATE OR DELETE part, which leads to automatically removing the references dat.

Remove duplicates by case insensitive and remove all references in other tables

Answers (1)

Related Questions