Nestor
Nestor

Reputation: 13990

What index to create to speed up T-SQL MERGE

What index can I add to the table:

CREATE TABLE [WData](
    [Account] [varchar](50) NOT NULL,
    [Table] [varchar](50) NOT NULL,
    [BatchID] [datetime2](7) NOT NULL,
    [XmlRow] [xml] NULL
) ON [PRIMARY]

so that the following up stored procedure runs faster? Right now it's quite slow.

create PROCEDURE [Update_WData]
    @Account VARCHAR(50),
    @Table VARCHAR(50),
    @BatchID DATETIME2,
    @XmlRows xml 
AS
BEGIN
    SET NOCOUNT ON;

    DECLARE @input TABLE (
        [XmlRow] xml NULL
    );
    INSERT INTO @input (XmlRow)
    SELECT 
        c1.query('.')
    FROM @XmlRows.nodes('/Block/NewRow') AS t(c1);

    DECLARE @output TABLE ([ACTION] NVARCHAR(50) );

    MERGE WData AS t
    USING @input AS s 
    ON (t.Account = @Account AND t.[Table]=@Table AND CONVERT(VARCHAR(max),t.XmlRow)=CONVERT(VARCHAR(max),s.XmlRow))

    WHEN NOT MATCHED BY TARGET 
        THEN INSERT (Account,[Table],BatchID, [XmlRow]) 
        VALUES (@Account, @Table, @BatchID, s.XmlRow )
    WHEN MATCHED AND t.BatchID <> @BatchID
        THEN UPDATE SET t.BatchID = @BatchID
    OUTPUT 
        $ACTION
    INTO @output;
    SELECT [Action], COUNT(*) AS [Count]
    FROM @Output
    GROUP BY [Action]
END

Upvotes: 1

Views: 1070

Answers (1)

Mikael Eriksson
Mikael Eriksson

Reputation: 138990

You have not told us about the selectivity of your clustered key but I assume there is more than a few rows with duplicate values since you have a performance issue.

The join used to match rows from @input to WData will be a Nested Loops join with clustered index seek against WData on the inner side of the join. The estimated number of rows coming from @input is 1 so SQL Server thinks it needs do the range seek operation in WData once to find all the rows that it needs to compare the content of the XML column.

Lets say that you have 10000 rows for a unique combination of Account and [Table] and your XML contains 6 NewRow nodes that is shredded to six rows in @Input.

enter image description here

The Clustered Index seek is executed 6 times returning 10000 rows each time that is then comparing the XML columns in the join, 60000 times.

What you can do is to add a computed persisted column that is a hash of the XML column.

CREATE TABLE [WData](
    [Account] [varchar](50) NOT NULL,
    [Table] [varchar](50) NOT NULL,
    [BatchID] [datetime2](7) NOT NULL,
    [XmlRow] [xml] NULL,
    H AS CAST(HASHBYTES('SHA2_512', CAST([XmlRow] AS VARBINARY(MAX))) AS BINARY(64)) PERSISTED
) ON [PRIMARY]

And also add the computed column to @Input.

DECLARE @input TABLE (
    [XmlRow] XML NULL,
    H AS CAST(HASHBYTES('SHA2_512', CAST([XmlRow] AS VARBINARY(MAX))) AS BINARY(64)) PERSISTED
);

And use the column H in the ON clause of the MERGE statement.

ON (
    t.Account = @Account AND 
    t.[Table]=@Table AND 
    t.H = s.H AND 
    CONVERT(VARCHAR(max),t.XmlRow)=CONVERT(VARCHAR(max),s.XmlRow)
   )

It is unlikely that you will have any collisions (same value for different XML values) for the generated hash but the hash is only generated using the first 8000 bytes so you should keep the original comparison of the XML column.

After the modifications you end up with this query plan.

enter image description here

The number of reads is the same because the Clustered Index Seek is still executed 6 times checking 10000 rows. The check against the hash is done in the seek as a residual predicate making the operator not return any rows at all in this case so there are no comparisons done on the XML column.

If you don't mind extending the clustered key with 64 bytes you can also add H to the key and drastically lower the number of reads. In my test it went from 1261 down to 62.

CREATE CLUSTERED INDEX IX_WData ON WData(Account, [Table], H)

Upvotes: 5

Related Questions