SQL query for rows migration to distinct occurences of their relations

Question

I have following data structure containing sample data:

Groups Table

id (PK)	name	foreign_id
A	DUPLICATED_NAME	X
B	DUPLICATED_NAME	X
C	string	Y
D	string	Z

Items Table (related to Groups)

id (PK)	group_id
uuid value1	A
uuid value2	B
uuid value3	C
uuid value4	D

My goal is to get rid of id column in Groups and make name and foreign_id columns new composite primary key.

For this, I need to migrate existing items to distinct occurrences of groups. In example I've posted it means, that I need to delete group which id = B because of DUPLICATED_NAME clashing with group A, and then reassign all items with group_id = B to A, since B no longer exists, and so on, for all rows.

I'm using PostgresSQL and Liquibase.

S-Man · Accepted Answer

step-by-step demo:db<>fiddle

UPDATE items i
SET group_id = s.first_value  -- 3
FROM (
    SELECT
        i.*,
        g.first_value
    FROM items i
    JOIN (
        SELECT
            *,
            -- 1
            first_value(id) OVER (PARTITION BY name, foreign_key ORDER BY id)
        FROM groups
    ) g ON g.id = i.group_id  -- 2
) s
WHERE i.id = s.id;

Use the first_value() window function to find the first id value for the duplicates group, which is added as only value to all records of the group (= partition)
Join the group table on the items table by the old group_id
Update the items table using the first_value

If you want to execute the DELETE statement for the groups table in the same query, you can use CTEs (WITH clauses) for that:

demo:db<>fiddle

WITH new_group_id AS (
    SELECT                                  -- 1
        i.*,
        g.id as g_id,
        g.first_value
    FROM items i
    JOIN (
        SELECT
            *,
            first_value(id) OVER (PARTITION BY name, foreign_key ORDER BY id)
        FROM groups
   ) g ON g.id = i.group_id
), update_items AS (
   UPDATE items i                           -- 2
   SET group_id = s.first_value
   FROM (SELECT * FROM new_group_id) s
   WHERE s.id = i.id
)
DELETE FROM groups                          -- 3
WHERE id IN (
    SELECT g_id
    FROM new_group_id
    WHERE g_id <> first_value
);

First calculate the new group_id as we did above. Store the result in a CTE because we reuse it later twice.
Do the UPDATE statement like we did above. Only difference is that we do not use a subquery but the previously calculated CTE result
After updating we can use the query once more to find the records which should be deleted (every record where the first_value is not the same as the actual group's id). Execute the DELETE statement.

SQL query for rows migration to distinct occurences of their relations

Answers (1)

Related Questions