SaszaKonopka
SaszaKonopka

Reputation: 63

SQL query for rows migration to distinct occurences of their relations

I have following data structure containing sample data:

Groups Table

id (PK) name foreign_id
A DUPLICATED_NAME X
B DUPLICATED_NAME X
C string Y
D string Z

Items Table (related to Groups)

id (PK) group_id
uuid value1 A
uuid value2 B
uuid value3 C
uuid value4 D

My goal is to get rid of id column in Groups and make name and foreign_id columns new composite primary key.

For this, I need to migrate existing items to distinct occurrences of groups. In example I've posted it means, that I need to delete group which id = B because of DUPLICATED_NAME clashing with group A, and then reassign all items with group_id = B to A, since B no longer exists, and so on, for all rows.

I'm using PostgresSQL and Liquibase.

Upvotes: 2

Views: 121

Answers (1)

S-Man
S-Man

Reputation: 23676

step-by-step demo:db<>fiddle

UPDATE items i
SET group_id = s.first_value  -- 3
FROM (
    SELECT
        i.*,
        g.first_value
    FROM items i
    JOIN (
        SELECT
            *,
            -- 1
            first_value(id) OVER (PARTITION BY name, foreign_key ORDER BY id)
        FROM groups
    ) g ON g.id = i.group_id  -- 2
) s
WHERE i.id = s.id;
  1. Use the first_value() window function to find the first id value for the duplicates group, which is added as only value to all records of the group (= partition)
  2. Join the group table on the items table by the old group_id
  3. Update the items table using the first_value

If you want to execute the DELETE statement for the groups table in the same query, you can use CTEs (WITH clauses) for that:

demo:db<>fiddle

WITH new_group_id AS (
    SELECT                                  -- 1
        i.*,
        g.id as g_id,
        g.first_value
    FROM items i
    JOIN (
        SELECT
            *,
            first_value(id) OVER (PARTITION BY name, foreign_key ORDER BY id)
        FROM groups
   ) g ON g.id = i.group_id
), update_items AS (
   UPDATE items i                           -- 2
   SET group_id = s.first_value
   FROM (SELECT * FROM new_group_id) s
   WHERE s.id = i.id
)
DELETE FROM groups                          -- 3
WHERE id IN (
    SELECT g_id
    FROM new_group_id
    WHERE g_id <> first_value
);
  1. First calculate the new group_id as we did above. Store the result in a CTE because we reuse it later twice.
  2. Do the UPDATE statement like we did above. Only difference is that we do not use a subquery but the previously calculated CTE result
  3. After updating we can use the query once more to find the records which should be deleted (every record where the first_value is not the same as the actual group's id). Execute the DELETE statement.

Upvotes: 1

Related Questions