Why does varchar(max) column in concat function cause query plan to blow up

Question

Context

Tables ranging from 1M - 200M records. All tables in question have zero indexes on them as they are staging/landing areas for our raw data sources. The purpose of having cast(NULL as varchar(max)) below was to handle 1 situation where our hashbyte function needed to handle a combination of varchar(255) columns that totaled up to over 8000 bytes, so a developer implemented that on our entire loading process across the board.

This hashbyte function is being used to identify when a record changes for incremental load.

I attempted to simulate this with small temp tables, but could not simulate this.

Code in Question

CAST (
        HASHBYTES
        (
            'SHA2_256',
            CONCAT
                (
                    [CTE].[TransactionDetailTypeCode], --varchar(50)
                    '|',
                    [CTE].[TransactionDetailAmount], --decimal(19,4)
                    '|',
                    [CTE].[OriginalTransactionPostDate], --date
                    '|',
                    [CTE].[SourceSystemCode], --varchar(50)
                    '|',
                    [CTE].[EtlSourceTransaction], --varchar(500)
                    '|',
                    [CTE].[EtlSourceTransactionDetail], --varchar(500)
                    '|',
                    CAST(NULL AS VARCHAR(MAX)) --COMMENT THIS LINE OUT
                )
        )
        AS varbinary(32)
    ) AS [EtlRowHashValue]

Query Plans

Query Plan with varchar(max) code included

Query Plan with varchar(max) code commented out

Discussion

My findings are that the subtree cost with the code snippet included is over 22 times higher than having it commented out. I also noticed it forcing a sort for the merge join, which is what I'm suspecting is the cause of the abysmal performance. I'm assuming there's not an easy answer for this one, but out of curiosity,

Can someone please explain whether its the use of the HASHBYTE, the CONCAT or some other reason as to why including cast(NULL as varchar(max)) would cause a query plan to change this drastically?

Is this being caused because of some under-the-hood functionality that I'm perhaps just being ignorant about?

Why does varchar(max) column in concat function cause query plan to blow up

Context

Code in Question

Query Plans

Discussion

Answers (0)

Related Questions