Tomo
Tomo

Reputation: 434

T-SQL MERGE won't handle NULLs as expected

If this query:

SELECT CONCAT(SOURCE.OrderNo, '_', SOURCE.OrderLine), 
       SOURCE.ProdOrder, 
       SOURCE.Lvl1, 
       SOURCE.Lvl2, 
       SOURCE.Lvl3, 
       SOURCE.LastDate 
FROM   dbo.SourceTbl AS SOURCE 

returns 11 records and this query:

SELECT CONCAT(TARGET.OrderNo, '_', TARGET.OrderLine), 
       TARGET.ProdOrder, 
       TARGET.Lvl1, 
       TARGET.Lvl2, 
       TARGET.Lvl3, 
       TARGET.LastDate 
FROM   dbo.TargetTbl AS TARGET 

returns 17 records and the INTERSECT between the two:

SELECT CONCAT(SOURCE.OrderNo, '_', SOURCE.OrderLine), 
       SOURCE.ProdOrder, 
       SOURCE.Lvl1, 
       SOURCE.Lvl2, 
       SOURCE.Lvl3, 
       SOURCE.LastDate 
FROM   dbo.SourceTbl AS SOURCE 
INTERSECT 
SELECT CONCAT(TARGET.OrderNo, '_', TARGET.OrderLine), 
       TARGET.ProdOrder, 
       TARGET.Lvl1, 
       TARGET.Lvl2, 
       TARGET.Lvl3, 
       TARGET.LastDate 
FROM   dbo.TargetTbl AS TARGET 

returns 9 records, when I do a MERGE like this:

MERGE dbo.TargetTbl AS TARGET
USING (
       SELECT   OrderNo, OrderLine, CONCAT(OrderNo, '_', OrderLine) AS OrderNoLine, SomeModel, ProdOrder, Lvl1, Lvl2, Lvl3,
                MAX(LastDate) AS LastDate
       FROM dbo.SourceTbl
       GROUP BY OrderNo, OrderLine, CONCAT(OrderNo, '_', OrderLine), SomeModel, ProdOrder, Lvl1, Lvl2, Lvl3
      ) AS SOURCE 
      ON CONCAT(TARGET.OrderNo, '_', TARGET.OrderLine) = OrderNoLine 
         AND TARGET.ProdOrder = SOURCE.ProdOrder
         AND TARGET.Lvl1 = SOURCE.Lvl1
         AND TARGET.Lvl2 = SOURCE.Lvl2
         AND TARGET.Lvl3 = SOURCE.Lvl3  
         AND TARGET.LastDate = SOURCE.LastDate
WHEN MATCHED AND EXISTS (SELECT CONCAT(SOURCE.OrderNo, '_', SOURCE.OrderLine)
                               ,SOURCE.ProdOrder
                               ,SOURCE.Lvl1
                               ,SOURCE.Lvl2
                               ,SOURCE.Lvl3 
                               ,SOURCE.LastDate
                         INTERSECT 
                         SELECT CONCAT(TARGET.OrderNo, '_', TARGET.OrderLine)
                               ,TARGET.ProdOrder
                               ,TARGET.Lvl1
                               ,TARGET.Lvl2
                               ,TARGET.Lvl3
                               ,TARGET.LastDate
                        )
THEN UPDATE SET TARGET.IsBlocked = 1, TARGET.BlockDate = GETDATE()
WHEN NOT MATCHED BY TARGET 
THEN INSERT (LastDate, UsrID, DepID, OrderNo, OrderLine, SomeModel, ProdOrder, Lvl1, Lvl2, Lvl3, IsBlocked, BlockDate)
     VALUES (SOURCE.LastDate, 999, 999, SOURCE.OrderNo, SOURCE.OrderLine, SOURCE.SomeModel, SOURCE.ProdOrder, SOURCE.Lvl1, SOURCE.Lvl2, SOURCE.Lvl3, 1, GETDATE());

it should, according to this and this, UPDATE the 9 INTERSECT records of the TargetTbl and INSERT to that same table the remaining 2 records from the SourceTbl (11 in total). Instead it updates 4 records and inserts 6 records (10 in total). Two records in the SourceTbl are duplicated and that's the reason of 10 instead of 11 and that's also why I used the MAX & GROUP BY.

I think it's the first part of the query, the USING part, that cannot handle NULLs correctly even if the INTERSECT part does its job. I tried everything I was able to, but no success. I'm sure it's something easily doable, so please, help me. Thank you.

EDIT: SourceTbl data by using SELECT OrderNo, OrderLine, CONCAT(OrderNo, '_', OrderLine) AS OrderNoLine, SomeModel, ProdOrder, Lvl1, Lvl2, Lvl3, LastDate AS LastDate FROM dbo.SourceTbl ORDER BY OrderNo, OrderLine, SomeModel, ProdOrder, irrelevant columns are omitted:

OrderNo OrderLine   OrderNoLine SomeModel   ProdOrder   Lvl1    Lvl2    Lvl3    LastDate
123c08637   10  123c08637_10    4321525175_004321   A5C008837   Abcd    Efgh    Olol    04/03/2030
123c11214   10  123c11214_10    4321532622_000391   NULL    NULL    NULL    NULL    07/07/2018
123c13039   10  123c13039_10    4321525175_002611   A5C014838   NULL    NULL    NULL    18/05/2018
123c16059   10  123c16059_10    4321541488_001111   A5C018611   NULL    NULL    NULL    18/05/2018
123c17482   10  123c17482_10    4321506480_001711   A5C019227   Asdf    Ghjk    Cvnm    12/12/2018
123c17482   10  123c17482_10    4321506480_001711   A5C047712   Asdf    Ghjk    Cvnm    12/12/2018
123c17482   20  123c17482_20    4321506480_001712   A5B072554   aaaa    bbbb    cccc    18/05/2018
123c17482   20  123c17482_20    4321506480_001712   A5B072554   aaaa    bbbb    cccc    18/05/2018
123c17482   20  123c17482_20    4321506480_001712   A5B072554   aaaa    bbbb    xxxx    18/05/2018
123c17482   20  123c17482_20    4321506480_001712   A5B200472   NULL    NULL    NULL    18/05/2018
123c32405   10  123c32405_10    8765525667_005301   NULL    Qwer    Uiop    Tygh    12/12/2018

Upvotes: 0

Views: 2791

Answers (3)

Eric
Eric

Reputation: 81

Use INTERSECT to compare NULLs properly:

MERGE dbo.TargetTbl AS TARGET
USING 
(   SELECT DISTINCT OrderNo, OrderLine, ProdOrder, Lvl1, Lvl2, Lvl3, LastDate 
    FROM dbo.SourceTbl
) AS SOURCE ON EXISTS
    (  SELECT TARGET.OrderNo, TARGET.OrderLine, TARGET.ProdOrder
         , TARGET.Lvl1, TARGET.Lvl2, TARGET.Lvl3, TARGET.LastDate 
       INTERSECT
       SELECT SOURCE.OrderNo, SOURCE.OrderLine, SOURCE.ProdOrder
         , SOURCE.Lvl1, SOURCE.Lvl2, SOURCE.Lvl3, SOURCE.LastDate
    )
     

Upvotes: 0

Damien_The_Unbeliever
Damien_The_Unbeliever

Reputation: 239784

Some features of the SQL language use a concept of distinctness (notably, DISTINCT and GROUP BY) where it's notable that NULL IS NOT DISTINCT FROM NULL is true. This is also surfaced in UNION (ALL), EXCEPT, INTERSECT, etc.

Unfortunately, SQL Server does not yet implement the IS (NOT) DISTINCT FROM operator from standard SQL, in and of iteself; And so you're left using equality comparisons, where famously in SQL, NULL = NULL is unknown (not true or false). So you have to explicitly perform NULL checks in your ON clause (until a future version of SQL Server supports the DISTINCT FROM operator)

Upvotes: 1

Wolfgang Kais
Wolfgang Kais

Reputation: 4100

The GROUP BY might reduce the number of records to only one (if the 11 records only differ in the LastDate column and if SomeModel contains the same value for all 11 records) or it might result in all 11 records (if SomeModel contains unique values), so that GROUP BY does not neccessarily resturn the 10 distinct rows. To achieve this, use SELECT DISTINCT instead of grouping by a subset of the columns.

Also, if the ON condition worked as you seem to expect it, the additional EXISTS condition was obsolete. Obviously, 4 matches are found and 6 records have no match. Within these 6, there could be 2 records that indeed have no match and 4 records that do not match because of NULL values.

To take care of NULL values, I suggest to change the whole statement to something like this:

MERGE dbo.TargetTbl AS TARGET
USING (
       SELECT DISTINCT OrderNo, OrderLine, ProdOrder, Lvl1, Lvl2, Lvl3, LastDate
       FROM dbo.SourceTbl
      ) AS SOURCE 
      ON     (TARGET.OrderNo = SOURCE.OrderNo OR TARGET.OrderNo IS NULL AND SOURCE.OrderNo IS NULL)
         AND (TARGET.OrderLine = SOURCE.OrderLine OR TARGET.OrderLine IS NULL AND SOURCE.OrderLine IS NULL)
         AND (TARGET.ProdOrder = SOURCE.ProdOrder OR TARGET.ProdOrder IS NULL AND SOURCE.ProdOrder IS NULL)
         AND (TARGET.Lvl1 = SOURCE.Lvl1 OR TARGET.Lvl1 IS NULL AND SOURCE.Lvl1 IS NULL)
         AND (TARGET.Lvl2 = SOURCE.Lvl2 OR TARGET.Lvl2 IS NULL AND SOURCE.Lvl2 IS NULL)
         AND (TARGET.Lvl3 = SOURCE.Lvl3 OR TARGET.Lvl3 IS NULL AND SOURCE.Lvl3 IS NULL)
         AND (TARGET.LastDate = SOURCE.LastDate OR TARGET.LastDate IS NULL AND SOURCE.LastDate IS NULL)
WHEN MATCHED 
THEN UPDATE SET TARGET.IsBlocked = 1, TARGET.BlockDate = GETDATE()
WHEN NOT MATCHED BY TARGET 
THEN INSERT (LastDate, UsrID, DepID, OrderNo, OrderLine, SomeModel, ProdOrder, Lvl1, Lvl2, Lvl3, IsBlocked, BlockDate)
     VALUES (LastDate, 999, 999, OrderNo, OrderLine, SomeModel, ProdOrder, Lvl1, Lvl2, Lvl3, 1, GETDATE());

Upvotes: 1

Related Questions