Reputation: 13990
What index can I add to the table:
CREATE TABLE [WData](
[Account] [varchar](50) NOT NULL,
[Table] [varchar](50) NOT NULL,
[BatchID] [datetime2](7) NOT NULL,
[XmlRow] [xml] NULL
) ON [PRIMARY]
so that the following up stored procedure runs faster? Right now it's quite slow.
create PROCEDURE [Update_WData]
@Account VARCHAR(50),
@Table VARCHAR(50),
@BatchID DATETIME2,
@XmlRows xml
AS
BEGIN
SET NOCOUNT ON;
DECLARE @input TABLE (
[XmlRow] xml NULL
);
INSERT INTO @input (XmlRow)
SELECT
c1.query('.')
FROM @XmlRows.nodes('/Block/NewRow') AS t(c1);
DECLARE @output TABLE ([ACTION] NVARCHAR(50) );
MERGE WData AS t
USING @input AS s
ON (t.Account = @Account AND t.[Table]=@Table AND CONVERT(VARCHAR(max),t.XmlRow)=CONVERT(VARCHAR(max),s.XmlRow))
WHEN NOT MATCHED BY TARGET
THEN INSERT (Account,[Table],BatchID, [XmlRow])
VALUES (@Account, @Table, @BatchID, s.XmlRow )
WHEN MATCHED AND t.BatchID <> @BatchID
THEN UPDATE SET t.BatchID = @BatchID
OUTPUT
$ACTION
INTO @output;
SELECT [Action], COUNT(*) AS [Count]
FROM @Output
GROUP BY [Action]
END
Upvotes: 1
Views: 1070
Reputation: 138990
You have not told us about the selectivity of your clustered key but I assume there is more than a few rows with duplicate values since you have a performance issue.
The join used to match rows from @input
to WData
will be a Nested Loops join with clustered index seek against WData
on the inner side of the join. The estimated number of rows coming from @input
is 1 so SQL Server thinks it needs do the range seek operation in WData
once to find all the rows that it needs to compare the content of the XML column.
Lets say that you have 10000 rows for a unique combination of Account
and [Table]
and your XML contains 6 NewRow
nodes that is shredded to six rows in @Input
.
The Clustered Index seek is executed 6 times returning 10000 rows each time that is then comparing the XML columns in the join, 60000 times.
What you can do is to add a computed persisted column that is a hash of the XML column.
CREATE TABLE [WData](
[Account] [varchar](50) NOT NULL,
[Table] [varchar](50) NOT NULL,
[BatchID] [datetime2](7) NOT NULL,
[XmlRow] [xml] NULL,
H AS CAST(HASHBYTES('SHA2_512', CAST([XmlRow] AS VARBINARY(MAX))) AS BINARY(64)) PERSISTED
) ON [PRIMARY]
And also add the computed column to @Input
.
DECLARE @input TABLE (
[XmlRow] XML NULL,
H AS CAST(HASHBYTES('SHA2_512', CAST([XmlRow] AS VARBINARY(MAX))) AS BINARY(64)) PERSISTED
);
And use the column H
in the ON
clause of the MERGE
statement.
ON (
t.Account = @Account AND
t.[Table]=@Table AND
t.H = s.H AND
CONVERT(VARCHAR(max),t.XmlRow)=CONVERT(VARCHAR(max),s.XmlRow)
)
It is unlikely that you will have any collisions (same value for different XML values) for the generated hash but the hash is only generated using the first 8000 bytes so you should keep the original comparison of the XML column.
After the modifications you end up with this query plan.
The number of reads is the same because the Clustered Index Seek is still executed 6 times checking 10000 rows. The check against the hash is done in the seek as a residual predicate making the operator not return any rows at all in this case so there are no comparisons done on the XML column.
If you don't mind extending the clustered key with 64 bytes you can also add H
to the key and drastically lower the number of reads. In my test it went from 1261 down to 62.
CREATE CLUSTERED INDEX IX_WData ON WData(Account, [Table], H)
Upvotes: 5