Kimberly Anne
Kimberly Anne

Reputation: 25

Oracle SQL - when matched, update AND insert

I've found some similar questions here, but none of them seemed to actually fit my situation.

The table I'm working on is something like this, it's a table that records students performances on their courses:

| STUDENT_ID | COURSE_ID | ENROLLMENT_TYPE | MARK | STATUS  | VERSION |
|            |           |                 |      |         |         |
| 1234       | 5678      | Mandatory       | 70   | ACTIVE  | 2       |
| 1234       | 5678      | Optional        | 70   | HISTORY | 1       |
| 1234       | 5678      | Optional        | null | HISTORY | 0       |
| 9876       | 4597      | Institutional   | 99   | ACTIVE  | 1       |
| 9876       | 4597      | Institutional   | null | HISTORY | 0       |

I need to merge this with another table, that keeps track of the enrollments of students based on their group, so that I can insert or update the rows as needed:

| GROUP_ID | STUDENT_ID | COURSE_ID | ENROLLMENT_TYPE |
| 4976555  | 1234       | 5678      | Mandatory       |
| 6399875  | 1234       | 9034      | Optional        |
| 6399875  | 9876       | 4597      | Institutional   |

Long story short, I need to check if the enrollment type is the same or has changed, since some students get enrolled in courses based on the group they're part of, but the groups can change overnight.

All fine until here, but if the rows match, I also need to copy the row I'm about to update and set it as "HISTORY", so that we can keep a record of all the updates that have been made to a certain row.

At the moment, I have a typical

MERGE INTO performance USING group_enrollments
ON performance.STUDENT_ID = group_enrollments.STUDENT_ID 
AND performance.COURSE_ID = group_enrollments.COURSE_ID
WHEN MATCHED THEN UPDATE ......
WHEN NOT MATCHED THEN INSERT ......

The people who worked on this part of code before me thought that it would be a great idea just to copy all the rows as "HISTORY" ones before doing the merge, but this is giving us problems as this procedure runs every night and writes more than 150.000 rows every time. Any ideas?

16/01 12.34: updated, more info about the tables and their relation

Upvotes: 2

Views: 10920

Answers (1)

Boneist
Boneist

Reputation: 23588

Ok, so first off, I would write a query that produced the rows to be updated and/or inserted:

WITH    performance AS (SELECT 1234 student_id, 5678 course_id, 'Mandatory' enrollment_type, 70 mark, 'ACTIVE' status, 2 VERSION FROM dual UNION ALL
                        SELECT 1234 student_id, 5678 course_id, 'Optional' enrollment_type, 70 mark, 'HISTORY' status, 1 VERSION FROM dual UNION ALL
                        SELECT 1234 student_id, 5678 course_id, 'Optional' enrollment_type, NULL mark, 'HISTORY' status, 0 VERSION FROM dual UNION ALL
                        SELECT 9876 student_id, 4597 course_id, 'Institutional' enrollment_type, 99 mark, 'ACTIVE' status, 1 VERSION FROM dual UNION ALL
                        SELECT 9876 student_id, 4597 course_id, 'Institutional' enrollment_type, NULL mark, 'HISTORY' status, 0 VERSION FROM dual),
  group_enrollments AS (SELECT 4976555 group_id, 1234 student_id, 5678 course_id, 'Mandatory2' enrollment_type FROM dual UNION ALL
                        SELECT 6399875 group_id, 1234 student_id, 9034 course_id, 'Optional' enrollment_type FROM dual UNION ALL
                        SELECT 6399875 group_id, 9876 student_id, 4597 course_id, 'Institutional' enrollment_type FROM dual)
-- end of mimicking your tables with data in them
SELECT res.student_id,
       res.course_id,
       CASE WHEN dummy.id = 1 THEN res.new_enrollment_type
            WHEN dummy.id = 2 THEN res.old_enrollment_type
       END enrollment_type,
       res.mark,
       CASE WHEN dummy.id = 1 THEN 'ACTIVE'
            WHEN dummy.id = 2 THEN 'HISTORY'
       END status,
       CASE WHEN dummy.id = 1 THEN res.new_version
            WHEN dummy.id = 2 THEN res.old_version
       END VERSION 
FROM   (SELECT ge.student_id,
               ge.course_id,
               ge.enrollment_type new_enrollment_type,
               p.enrollment_type old_enrollment_type,
               p.mark,
               p.status,
               p.version old_version,
               nvl(p.version + 1, 0) new_VERSION
                 -- n.b. this may produce duplicates or unique constraint errors in a concurrent environment
        FROM   group_enrollments ge
               LEFT OUTER JOIN PERFORMANCE p ON ge.student_id = p.student_id
                                                AND ge.course_id = p.course_id
        WHERE  (p.status = 'ACTIVE' OR p.status IS NULL)
        AND    (p.enrollment_type != ge.enrollment_type OR p.enrollment_type IS NULL)) res
        INNER JOIN (SELECT 1 ID FROM dual UNION ALL
                    SELECT 2 ID FROM dual) dummy ON dummy.id = 1
                                                    OR (dummy.id = 2 
                                                        AND res.status = 'ACTIVE');

STUDENT_ID  COURSE_ID ENROLLMENT_TYPE       MARK STATUS     VERSION
---------- ---------- --------------- ---------- ------- ----------
      1234       5678 Mandatory2              70 ACTIVE           3
      1234       9034 Optional                   ACTIVE           0
      1234       5678 Mandatory               70 HISTORY          2

This query first of all finds any rows that are brand new (i.e. rows in the group_enrollment table that don't have a row in the performance table) or have a different enrollment_type. These are the rows that need inserting or updating.

Once we know that, we can then join a dummy, 2-row table table such that we'll always join to the first dummy row regardless of whether we need to insert or update, but we'll only join to the second dummy row if we need to update. This means we will only ever have one row for an insert, but two rows for an update.

Then it's an easy matter of outputting the correct values based on the dummy.id (new values for the first dummy row, old values for the second dummy row.

Once we've done that, we know what data needs to be merged into the performance table, so now the merge statement will look something like:

merge into performance tgt
  using (SELECT res.student_id,
                res.course_id,
                CASE WHEN dummy.id = 1 THEN res.new_enrollment_type
                     WHEN dummy.id = 2 THEN res.old_enrollment_type
                END enrollment_type,
                res.mark,
                CASE WHEN dummy.id = 1 THEN 'ACTIVE'
                     WHEN dummy.id = 2 THEN 'HISTORY'
                END status,
                CASE WHEN dummy.id = 1 THEN res.new_version
                     WHEN dummy.id = 2 THEN res.old_version
                END VERSION 
         FROM   (SELECT ge.student_id,
                        ge.course_id,
                        ge.enrollment_type new_enrollment_type,
                        p.enrollment_type old_enrollment_type,
                        p.mark,
                        p.status,
                        p.version old_version,
                        nvl(p.version + 1, 0) new_VERSION
                          -- n.b. this may produce duplicates or unique constraint errors in a concurrent environment
                 FROM   group_enrollments ge
                        LEFT OUTER JOIN PERFORMANCE p ON ge.student_id = p.student_id
                                                         AND ge.course_id = p.course_id
                 WHERE  (p.status = 'ACTIVE' OR p.status IS NULL)
                 AND    (p.enrollment_type != ge.enrollment_type OR p.enrollment_type IS NULL)) res
                 INNER JOIN (SELECT 1 ID FROM dual UNION ALL
                             SELECT 2 ID FROM dual) dummy ON dummy.id = 1
                                                             OR (dummy.id = 2 
                                                                 AND res.status = 'ACTIVE')) src
    ON (tgt.student_id = src.student_id AND tgt.course_id = src.course_id AND tgt.status = src.status)
WHEN MATCHED THEN
  UPDATE SET tgt.enrollment_type = src.enrollment_type,
             tgt.version = src.version
WHEN NOT MATCHED THEN
  INSERT (tgt.student_id, tgt.course_id, tgt.enrollment_type, tgt.mark, tgt.status, tgt.version)
  VALUES (src.student_id, src.course_id, src.enrollment_type, src.mark, src.status, src.version);

For clarification purposes, here's a very simple example of the conditional duplication of rows (we could also call it a partial cross join, since all rows in one table are joined to at least one row in the other):

WITH sample_data AS (SELECT 100 ID, NULL status FROM dual UNION ALL -- expect only one row
                     SELECT 101 ID, 'A' status FROM dual UNION ALL -- expect two rows
                     SELECT 102 ID, 'B' status FROM dual -- expect only one row
                    )
SELECT dummy.id dummy_row_id,
       sd.id,
       sd.status
FROM   sample_data sd
       INNER JOIN (SELECT 1 ID FROM dual UNION ALL
                   SELECT 2 ID FROM dual) dummy ON dummy.id = 1
                                                   OR (dummy.id = 2 
                                                       AND sd.status = 'A')
ORDER BY sd.id, dummy.id;

DUMMY_ROW_ID         ID STATUS
------------ ---------- ------
           1        100 
           1        101 A
           2        101 A
           1        102 B

You can see that for the id=101 row from the sample_data "table", we have two rows, but the other two ids only have one row each.

Hopefully that clarifies things for you?

Upvotes: 2

Related Questions