Reputation: 75
I have a table on a SQL Server 2014 database that stores auditing information for record changes in a VARCHAR(MAX)
column (poor man's CDC.)
This data is in following format:
<span class="fieldname">Assigned To</span>
changed from <span class="oldvalue">user1</span>
to <span class="newvalue">user2</span><br />
<span class="fieldname">Status</span>
changed from <span class="oldvalue">QA</span>
to <span class="newvalue">Development</span><br />
<span class="fieldname">Progress</span>
changed from <span class="oldvalue">Yes</span>
to <span class="newvalue">No</span><br />
...
I need to parse that information in order to retrieve the data transposed so that it looks like so:
Record FieldName OldValue NewValue
------ --------- -------- --------
1234 Assigned To user1 user2
1234 Status QA Development
1234 Progress Yes No
The stored procedure attempts to do this by converting the data to XML and then using XPath retrieving the necessary pieces:
;WITH TT AS (
SELECT TransId,
CAST('<root><rec>' + REPLACE(REPLACE(TransDescription, 'Ticket reopened... Status', 'Status'), '<br />', '</rec><rec>') + '</rec></root>' AS XML) TransXml
FROM dbo.Trans
WHERE TransDate >= '11/1/2016'
AND (TransDescription LIKE '%Ticket reopened... Status%' OR TransDescription LIKE '%Status%'))
SELECT TransId,
TransXml,
FieldName = T.V.value('span[@class="fieldname"][1]', 'varchar(255)'),
OldValue = NULLIF(T.V.value('span[@class="oldvalue"][1]', 'varchar(255)'), 'nothing'),
NewValue = NULLIF(T.V.value('span[@class="newvalue"][1]', 'varchar(255)'), 'nothing')
INTO #tmp
FROM TT
CROSS APPLY TT.TransXml.nodes('root/rec') T(V);
Here is the execution plan: https://www.brentozar.com/pastetheplan/?id=rJF2GRB7g
The corresponding IO stats:
Table 'Trans'. Scan count 9, logical reads 27429, physical reads 0, read-ahead reads 0, lob logical reads 0, lob physical reads 0, lob read-ahead reads 0.
Table 'Worktable'. Scan count 0, logical reads 2964994, physical reads 0, read-ahead reads 0, lob logical reads 2991628, lob physical reads 0, lob read-ahead reads 0.
This query is excruciatingly slow (the example was for just 10 days' worth of data,) and gets progressively slower with more data.
What are my options for tuning this query?
Upvotes: 2
Views: 1195
Reputation: 1931
CROSS JOIN is one of those things that just doesn't scale very well, as your tables get larger the "number of executions" on your "nested loops" grows exponentially. On the execution plan you submitted the figures were in excess of 0.6 million on each loop. Your logical reads are genuinely low, but the pages get processes over and over again. (If your query size ever spills over your buffer size and spools onto disk then you'll experience real hurt.)
Here's a solution that allows you to take advantage of XML indexes, it might help your situation.
--PREPARE SAMPLE DATA
DROP TABLE #Trans
CREATE TABLE #Trans(TransID INT
,TransDate DATE
,TransDescription VARBINARY(MAX)
)
INSERT INTO #Trans VALUES
(
1, '20160101'
,CAST('<span class="fieldname">Assigned To</span>
changed from <span class="oldvalue">user1</span>
to <span class="newvalue">user2</span><br />
<span class="fieldname">Status</span>
changed from <span class="oldvalue">QA</span>
to <span class="newvalue">Development</span><br />
<span class="fieldname">Progress</span>
changed from <span class="oldvalue">Yes</span>
to <span class="newvalue">No</span><br />' AS varbinary(MAX)))
,(2, '20160101'
,CAST('<span class="fieldname">Assigned To</span>
changed from <span class="oldvalue">user1</span>
to <span class="newvalue">user2</span><br />
<span class="fieldname">Status</span>
changed from <span class="oldvalue">QA</span>
to <span class="newvalue">Development</span><br />
<span class="fieldname">Progress</span>
changed from <span class="oldvalue">Yes</span>
to <span class="newvalue">No</span><br />' AS varbinary(MAX)))
,(3, '20160101'
,CAST('<span class="fieldname">Assigned To</span>
changed from <span class="oldvalue">user1</span>
to <span class="newvalue">user2</span><br />
<span class="fieldname">Status</span>
changed from <span class="oldvalue">QA</span>
to <span class="newvalue">Development</span><br />
<span class="fieldname">Progress</span>
changed from <span class="oldvalue">Yes</span>
to <span class="newvalue">No</span><br />' AS varbinary(MAX)))
---------------------------------------------------------------------------------------------------
--RUN BELOW THIS LINE COLLECTIVELY, THE ORIGINAL QUERY IS SHOWING UP WITH APPROX 93% OR OVERALL COST
--BUILD A TEMP TABLE TO RECIEVE XML FORMATTED DATA
DROP TABLE #XmlData
CREATE TABLE #XmlData (
TransId INT NOT NULL,
TransXml xml NOT NULL,
CONSTRAINT [PK_XmlData] PRIMARY KEY CLUSTERED (TransId)
)
--INSERT DATA INTO XML TABLE
INSERT INTO #XmlData
SELECT TransId,
CAST('<root><rec>' + REPLACE(REPLACE(TransDescription, 'Ticket reopened... Status', 'Status'), '<br />', '</rec><rec>') + '</rec></root>' AS XML) TransXml
FROM #Trans
WHERE TransDate >= '11/1/2015'
AND (TransDescription LIKE '%Ticket reopened... Status%' OR TransDescription LIKE '%Status%')
--CREATE AN XML INDEX
CREATE PRIMARY XML INDEX PXML_TransXml
ON #XmlData(TransXml)
--APPLY NODES QUERY AGAINST XML INDEX
SELECT TransId,
TransXml,
FieldName = T.V.value('span[@class="fieldname"][1]', 'varchar(255)'),
OldValue = NULLIF(T.V.value('span[@class="oldvalue"][1]', 'varchar(255)'), 'nothing'),
NewValue = NULLIF(T.V.value('span[@class="newvalue"][1]', 'varchar(255)'), 'nothing')
FROM #XmlData TT
CROSS APPLY TT.TransXml.nodes('root/rec') T(V);
---------------------------------
--Original Query
;WITH TT AS (
SELECT TransId,
CAST('<root><rec>' + REPLACE(REPLACE(TransDescription, 'Ticket reopened... Status', 'Status'), '<br />', '</rec><rec>') + '</rec></root>' AS XML) TransXml
FROM #Trans--dbo.Trans
WHERE TransDate >= '11/1/2015'
AND (TransDescription LIKE '%Ticket reopened... Status%' OR TransDescription LIKE '%Status%'))
SELECT TransId,
TransXml,
FieldName = T.V.value('span[@class="fieldname"][1]', 'varchar(255)'),
OldValue = NULLIF(T.V.value('span[@class="oldvalue"][1]', 'varchar(255)'), 'nothing'),
NewValue = NULLIF(T.V.value('span[@class="newvalue"][1]', 'varchar(255)'), 'nothing')
--INTO #tmp
FROM TT
CROSS APPLY TT.TransXml.nodes('root/rec') T(V);
Upvotes: 1
Reputation: 3756
What you really need to speed things up is some xml indexing. However, since you are creating the XML on the fly, this isn't happening. Effectively, this is the broad equivalent of a CROSS JOIN, and will get exponentially slower as time goes by.
See cross apply xml query performs exponentially worse as xml document grows for a detailed discussion and how indexing helps. If you want to do this via XML, you really will need to store XML so you can index the XML.
Upvotes: 2