Chahk
Chahk

Reputation: 75

Tuning a query to parse XML data on SQL Server 2014

I have a table on a SQL Server 2014 database that stores auditing information for record changes in a VARCHAR(MAX) column (poor man's CDC.)

This data is in following format:

<span class="fieldname">Assigned To</span>
   changed from <span class="oldvalue">user1</span>
   to <span class="newvalue">user2</span><br />
<span class="fieldname">Status</span>
   changed from <span class="oldvalue">QA</span>
   to <span class="newvalue">Development</span><br />
<span class="fieldname">Progress</span>
   changed from <span class="oldvalue">Yes</span>
   to <span class="newvalue">No</span><br />
...

I need to parse that information in order to retrieve the data transposed so that it looks like so:

Record    FieldName      OldValue   NewValue
------    ---------      --------   --------
1234      Assigned To    user1      user2
1234      Status         QA         Development
1234      Progress       Yes        No

The stored procedure attempts to do this by converting the data to XML and then using XPath retrieving the necessary pieces:

;WITH TT AS (
   SELECT TransId,
      CAST('<root><rec>' + REPLACE(REPLACE(TransDescription, 'Ticket reopened... Status', 'Status'), '<br />', '</rec><rec>') + '</rec></root>' AS XML) TransXml
   FROM dbo.Trans
   WHERE TransDate >= '11/1/2016'
      AND (TransDescription LIKE '%Ticket reopened... Status%' OR TransDescription LIKE '%Status%'))
SELECT TransId,
   TransXml,
   FieldName = T.V.value('span[@class="fieldname"][1]', 'varchar(255)'),
   OldValue = NULLIF(T.V.value('span[@class="oldvalue"][1]', 'varchar(255)'), 'nothing'),
   NewValue = NULLIF(T.V.value('span[@class="newvalue"][1]', 'varchar(255)'), 'nothing')
INTO #tmp
FROM TT
   CROSS APPLY TT.TransXml.nodes('root/rec') T(V);

Here is the execution plan: https://www.brentozar.com/pastetheplan/?id=rJF2GRB7g

The corresponding IO stats:

Table 'Trans'. Scan count 9, logical reads 27429, physical reads 0, read-ahead reads 0, lob logical reads 0, lob physical reads 0, lob read-ahead reads 0.
Table 'Worktable'. Scan count 0, logical reads 2964994, physical reads 0, read-ahead reads 0, lob logical reads 2991628, lob physical reads 0, lob read-ahead reads 0.

This query is excruciatingly slow (the example was for just 10 days' worth of data,) and gets progressively slower with more data.

What are my options for tuning this query?

Upvotes: 2

Views: 1195

Answers (2)

pacreely
pacreely

Reputation: 1931

CROSS JOIN is one of those things that just doesn't scale very well, as your tables get larger the "number of executions" on your "nested loops" grows exponentially. On the execution plan you submitted the figures were in excess of 0.6 million on each loop. Your logical reads are genuinely low, but the pages get processes over and over again. (If your query size ever spills over your buffer size and spools onto disk then you'll experience real hurt.)

Here's a solution that allows you to take advantage of XML indexes, it might help your situation.

--PREPARE SAMPLE DATA
DROP TABLE #Trans
CREATE TABLE #Trans(TransID INT
                    ,TransDate DATE
                    ,TransDescription VARBINARY(MAX)
                    )
INSERT INTO #Trans VALUES
(
1, '20160101'
,CAST('<span class="fieldname">Assigned To</span>
   changed from <span class="oldvalue">user1</span>
   to <span class="newvalue">user2</span><br />
<span class="fieldname">Status</span>
   changed from <span class="oldvalue">QA</span>
   to <span class="newvalue">Development</span><br />
<span class="fieldname">Progress</span>
   changed from <span class="oldvalue">Yes</span>
   to <span class="newvalue">No</span><br />' AS varbinary(MAX)))
,(2, '20160101'
,CAST('<span class="fieldname">Assigned To</span>
   changed from <span class="oldvalue">user1</span>
   to <span class="newvalue">user2</span><br />
<span class="fieldname">Status</span>
   changed from <span class="oldvalue">QA</span>
   to <span class="newvalue">Development</span><br />
<span class="fieldname">Progress</span>
   changed from <span class="oldvalue">Yes</span>
   to <span class="newvalue">No</span><br />' AS varbinary(MAX)))
,(3, '20160101'
,CAST('<span class="fieldname">Assigned To</span>
   changed from <span class="oldvalue">user1</span>
   to <span class="newvalue">user2</span><br />
<span class="fieldname">Status</span>
   changed from <span class="oldvalue">QA</span>
   to <span class="newvalue">Development</span><br />
<span class="fieldname">Progress</span>
   changed from <span class="oldvalue">Yes</span>
   to <span class="newvalue">No</span><br />' AS varbinary(MAX)))

---------------------------------------------------------------------------------------------------
--RUN BELOW THIS LINE COLLECTIVELY, THE ORIGINAL QUERY IS SHOWING UP WITH APPROX 93% OR OVERALL COST

--BUILD A TEMP TABLE TO RECIEVE XML FORMATTED DATA
DROP TABLE #XmlData
CREATE TABLE #XmlData (
    TransId INT NOT NULL,
    TransXml xml NOT NULL,
CONSTRAINT [PK_XmlData] PRIMARY KEY CLUSTERED (TransId)
) 

--INSERT DATA INTO XML TABLE
INSERT INTO #XmlData
SELECT TransId,
    CAST('<root><rec>' + REPLACE(REPLACE(TransDescription, 'Ticket reopened... Status', 'Status'), '<br />', '</rec><rec>') + '</rec></root>' AS XML) TransXml
FROM #Trans
WHERE TransDate >= '11/1/2015'
    AND (TransDescription LIKE '%Ticket reopened... Status%' OR TransDescription LIKE '%Status%')

--CREATE AN XML INDEX
CREATE PRIMARY XML INDEX PXML_TransXml
ON #XmlData(TransXml)

--APPLY NODES QUERY AGAINST XML INDEX
SELECT TransId,
   TransXml,
   FieldName = T.V.value('span[@class="fieldname"][1]', 'varchar(255)'),
   OldValue = NULLIF(T.V.value('span[@class="oldvalue"][1]', 'varchar(255)'), 'nothing'),
   NewValue = NULLIF(T.V.value('span[@class="newvalue"][1]', 'varchar(255)'), 'nothing')
FROM #XmlData TT
   CROSS APPLY TT.TransXml.nodes('root/rec') T(V);

---------------------------------
--Original Query
;WITH TT AS (
   SELECT TransId,
      CAST('<root><rec>' + REPLACE(REPLACE(TransDescription, 'Ticket reopened... Status', 'Status'), '<br />', '</rec><rec>') + '</rec></root>' AS XML) TransXml
   FROM #Trans--dbo.Trans
   WHERE TransDate >= '11/1/2015'
      AND (TransDescription LIKE '%Ticket reopened... Status%' OR TransDescription LIKE '%Status%'))

SELECT TransId,
   TransXml,
   FieldName = T.V.value('span[@class="fieldname"][1]', 'varchar(255)'),
   OldValue = NULLIF(T.V.value('span[@class="oldvalue"][1]', 'varchar(255)'), 'nothing'),
   NewValue = NULLIF(T.V.value('span[@class="newvalue"][1]', 'varchar(255)'), 'nothing')
--INTO #tmp
FROM TT
   CROSS APPLY TT.TransXml.nodes('root/rec') T(V);

Upvotes: 1

Laughing Vergil
Laughing Vergil

Reputation: 3756

What you really need to speed things up is some xml indexing. However, since you are creating the XML on the fly, this isn't happening. Effectively, this is the broad equivalent of a CROSS JOIN, and will get exponentially slower as time goes by.

See cross apply xml query performs exponentially worse as xml document grows for a detailed discussion and how indexing helps. If you want to do this via XML, you really will need to store XML so you can index the XML.

Upvotes: 2

Related Questions