Reputation: 10680
Like many developers, I perform a lot of merge operations with data, specifically SQL Server.
Historically, I have used the old trick of:-
1) Doing a left join on the existing data, and inserting anything I don't have a corresponding record for.
2) After 1), updating rows in my target table.
I have to take a performance hit on 1). It's unavoidable. However, on 2), I have been rather profligate. Instead of just updating stuff that needs updating, I've updated everything I've matched ( whether the underlying data has changed or not ).
Turns out that SQL Server isn't too smart about this sort of update. It performs no pre-check to determine that what you are about to update isn't the same thing as what you are using to update it. Hence, updates done along these lines result in a physical write and impact any indexes that reference the field.
So, from my POV, my choices are as follows:-
1) Carry on as normal, basking in the current profligacy of my routine (and refreshing indexes daily on large DBs)
2) Write more UPDATE statements that update a specific field if the field has changed.
e.g.
UPDATE
p2
SET
[SpecificField] = p1.[SpecificField]
FROM
@source p1,
Dest p2
WHERE
p2.ExternalKey = p1.ExternalKey
AND COALESCE(p1.[SpecificField],'') <> COALESCE(p2.[SpecificField],'')
3) Something infinitely better that the Stack Overflow community suggests.
I'd really like to go with 3). Are my options really limited to 1 or 2? Note. I have looked into MERGE INTO. Same problems, really.
Upvotes: 4
Views: 2380
Reputation: 15469
With MERGE INTO
, you have the option of adding an additional search clause to a WHEN (NOT) MATCHED
clause. For example,
MERGE INTO table_to_upsert AS target
USING source_table AS source
ON target.key1 = source.key1
AND target.key2 = source.key2
AND target.key3 = source.key3
WHEN MATCHED AND (target.value <> source.value) THEN
UPDATE SET target.value = source.value,
target.timestamp = source.timestamp
WHEN MATCHED AND (target.userid <> source.userid) THEN
UPDATE SET target.userid = source.userid,
target.timestamp = source.timestamp
WHEN NOT MATCHED THEN
INSERT (key1, key2, key3,
value, userid, timestamp)
VALUES (source.key1, source.key2, source.key3,
source.value, source.userid, source.timestamp)
However, this doesn't really solve your problem if multiple columns are updated at a time, the MERGE
would take the first WHEN (NOT) MATCHED
that evaluates true (similar to a CASE
statement).
Upvotes: 2