Reputation: 25
I've found some similar questions here, but none of them seemed to actually fit my situation.
The table I'm working on is something like this, it's a table that records students performances on their courses:
| STUDENT_ID | COURSE_ID | ENROLLMENT_TYPE | MARK | STATUS | VERSION |
| | | | | | |
| 1234 | 5678 | Mandatory | 70 | ACTIVE | 2 |
| 1234 | 5678 | Optional | 70 | HISTORY | 1 |
| 1234 | 5678 | Optional | null | HISTORY | 0 |
| 9876 | 4597 | Institutional | 99 | ACTIVE | 1 |
| 9876 | 4597 | Institutional | null | HISTORY | 0 |
I need to merge this with another table, that keeps track of the enrollments of students based on their group, so that I can insert or update the rows as needed:
| GROUP_ID | STUDENT_ID | COURSE_ID | ENROLLMENT_TYPE |
| 4976555 | 1234 | 5678 | Mandatory |
| 6399875 | 1234 | 9034 | Optional |
| 6399875 | 9876 | 4597 | Institutional |
Long story short, I need to check if the enrollment type is the same or has changed, since some students get enrolled in courses based on the group they're part of, but the groups can change overnight.
All fine until here, but if the rows match, I also need to copy the row I'm about to update and set it as "HISTORY", so that we can keep a record of all the updates that have been made to a certain row.
At the moment, I have a typical
MERGE INTO performance USING group_enrollments
ON performance.STUDENT_ID = group_enrollments.STUDENT_ID
AND performance.COURSE_ID = group_enrollments.COURSE_ID
WHEN MATCHED THEN UPDATE ......
WHEN NOT MATCHED THEN INSERT ......
The people who worked on this part of code before me thought that it would be a great idea just to copy all the rows as "HISTORY" ones before doing the merge, but this is giving us problems as this procedure runs every night and writes more than 150.000 rows every time. Any ideas?
16/01 12.34: updated, more info about the tables and their relation
Upvotes: 2
Views: 10920
Reputation: 23588
Ok, so first off, I would write a query that produced the rows to be updated and/or inserted:
WITH performance AS (SELECT 1234 student_id, 5678 course_id, 'Mandatory' enrollment_type, 70 mark, 'ACTIVE' status, 2 VERSION FROM dual UNION ALL
SELECT 1234 student_id, 5678 course_id, 'Optional' enrollment_type, 70 mark, 'HISTORY' status, 1 VERSION FROM dual UNION ALL
SELECT 1234 student_id, 5678 course_id, 'Optional' enrollment_type, NULL mark, 'HISTORY' status, 0 VERSION FROM dual UNION ALL
SELECT 9876 student_id, 4597 course_id, 'Institutional' enrollment_type, 99 mark, 'ACTIVE' status, 1 VERSION FROM dual UNION ALL
SELECT 9876 student_id, 4597 course_id, 'Institutional' enrollment_type, NULL mark, 'HISTORY' status, 0 VERSION FROM dual),
group_enrollments AS (SELECT 4976555 group_id, 1234 student_id, 5678 course_id, 'Mandatory2' enrollment_type FROM dual UNION ALL
SELECT 6399875 group_id, 1234 student_id, 9034 course_id, 'Optional' enrollment_type FROM dual UNION ALL
SELECT 6399875 group_id, 9876 student_id, 4597 course_id, 'Institutional' enrollment_type FROM dual)
-- end of mimicking your tables with data in them
SELECT res.student_id,
res.course_id,
CASE WHEN dummy.id = 1 THEN res.new_enrollment_type
WHEN dummy.id = 2 THEN res.old_enrollment_type
END enrollment_type,
res.mark,
CASE WHEN dummy.id = 1 THEN 'ACTIVE'
WHEN dummy.id = 2 THEN 'HISTORY'
END status,
CASE WHEN dummy.id = 1 THEN res.new_version
WHEN dummy.id = 2 THEN res.old_version
END VERSION
FROM (SELECT ge.student_id,
ge.course_id,
ge.enrollment_type new_enrollment_type,
p.enrollment_type old_enrollment_type,
p.mark,
p.status,
p.version old_version,
nvl(p.version + 1, 0) new_VERSION
-- n.b. this may produce duplicates or unique constraint errors in a concurrent environment
FROM group_enrollments ge
LEFT OUTER JOIN PERFORMANCE p ON ge.student_id = p.student_id
AND ge.course_id = p.course_id
WHERE (p.status = 'ACTIVE' OR p.status IS NULL)
AND (p.enrollment_type != ge.enrollment_type OR p.enrollment_type IS NULL)) res
INNER JOIN (SELECT 1 ID FROM dual UNION ALL
SELECT 2 ID FROM dual) dummy ON dummy.id = 1
OR (dummy.id = 2
AND res.status = 'ACTIVE');
STUDENT_ID COURSE_ID ENROLLMENT_TYPE MARK STATUS VERSION
---------- ---------- --------------- ---------- ------- ----------
1234 5678 Mandatory2 70 ACTIVE 3
1234 9034 Optional ACTIVE 0
1234 5678 Mandatory 70 HISTORY 2
This query first of all finds any rows that are brand new (i.e. rows in the group_enrollment table that don't have a row in the performance table) or have a different enrollment_type. These are the rows that need inserting or updating.
Once we know that, we can then join a dummy, 2-row table table such that we'll always join to the first dummy row regardless of whether we need to insert or update, but we'll only join to the second dummy row if we need to update. This means we will only ever have one row for an insert, but two rows for an update.
Then it's an easy matter of outputting the correct values based on the dummy.id (new values for the first dummy row, old values for the second dummy row.
Once we've done that, we know what data needs to be merged into the performance table, so now the merge statement will look something like:
merge into performance tgt
using (SELECT res.student_id,
res.course_id,
CASE WHEN dummy.id = 1 THEN res.new_enrollment_type
WHEN dummy.id = 2 THEN res.old_enrollment_type
END enrollment_type,
res.mark,
CASE WHEN dummy.id = 1 THEN 'ACTIVE'
WHEN dummy.id = 2 THEN 'HISTORY'
END status,
CASE WHEN dummy.id = 1 THEN res.new_version
WHEN dummy.id = 2 THEN res.old_version
END VERSION
FROM (SELECT ge.student_id,
ge.course_id,
ge.enrollment_type new_enrollment_type,
p.enrollment_type old_enrollment_type,
p.mark,
p.status,
p.version old_version,
nvl(p.version + 1, 0) new_VERSION
-- n.b. this may produce duplicates or unique constraint errors in a concurrent environment
FROM group_enrollments ge
LEFT OUTER JOIN PERFORMANCE p ON ge.student_id = p.student_id
AND ge.course_id = p.course_id
WHERE (p.status = 'ACTIVE' OR p.status IS NULL)
AND (p.enrollment_type != ge.enrollment_type OR p.enrollment_type IS NULL)) res
INNER JOIN (SELECT 1 ID FROM dual UNION ALL
SELECT 2 ID FROM dual) dummy ON dummy.id = 1
OR (dummy.id = 2
AND res.status = 'ACTIVE')) src
ON (tgt.student_id = src.student_id AND tgt.course_id = src.course_id AND tgt.status = src.status)
WHEN MATCHED THEN
UPDATE SET tgt.enrollment_type = src.enrollment_type,
tgt.version = src.version
WHEN NOT MATCHED THEN
INSERT (tgt.student_id, tgt.course_id, tgt.enrollment_type, tgt.mark, tgt.status, tgt.version)
VALUES (src.student_id, src.course_id, src.enrollment_type, src.mark, src.status, src.version);
For clarification purposes, here's a very simple example of the conditional duplication of rows (we could also call it a partial cross join, since all rows in one table are joined to at least one row in the other):
WITH sample_data AS (SELECT 100 ID, NULL status FROM dual UNION ALL -- expect only one row
SELECT 101 ID, 'A' status FROM dual UNION ALL -- expect two rows
SELECT 102 ID, 'B' status FROM dual -- expect only one row
)
SELECT dummy.id dummy_row_id,
sd.id,
sd.status
FROM sample_data sd
INNER JOIN (SELECT 1 ID FROM dual UNION ALL
SELECT 2 ID FROM dual) dummy ON dummy.id = 1
OR (dummy.id = 2
AND sd.status = 'A')
ORDER BY sd.id, dummy.id;
DUMMY_ROW_ID ID STATUS
------------ ---------- ------
1 100
1 101 A
2 101 A
1 102 B
You can see that for the id=101 row from the sample_data "table", we have two rows, but the other two ids only have one row each.
Hopefully that clarifies things for you?
Upvotes: 2