Reputation: 91
I want to make my SQL execution faster.
I have the table with a XML column. XML stored is in this format:
<r eid="bdcd3ba2-e370-4140-8c46-01aed584bab2">
<name>
<customValue>Aperol</customValue>
</name>
<description></description>
<num>46310660710877889975</num>
<outerEconomicActivityNomenclatureCode>ccbdd540-3bb4-4a9f-a8d3-5a31b9459846</outerEconomicActivityNomenclatureCode>
<type>DISH</type>
</r>
I want to get <outerEconomicActivityNomenclatureCode>
value from XML row, where <type>
in XML is DISH and <num>
is in ('46310660710877889975', '46310660710877889976', '46310660710877889975', '46310660710877889975')
My SQL queries:
this takes 1 minute and 44 seconds
SELECT
m.c.value('(name/customValue/text())[1]', 'varchar(50)') AS dish
FROM
_fabbrica.dbo.entity dish
CROSS APPLY
(SELECT CAST(dish.xml AS xml) AS realxml) s
CROSS APPLY
s.realxml.nodes('r') m(c)
WHERE
dish.type = 'Product'
AND m.c.value('(type/text())[1]', 'varchar(50)') = 'DISH'
AND m.c.value('(num/text())[1]', 'varchar(50)') IN ('94948475659909879621', 'code', 'code')
This query takes 2 minutes
SELECT
dish.id,
SUBSTRING(dish.xml, (CHARINDEX('<customValue>', dish.xml) + 13), (CHARINDEX('</customValue>', dish.xml) - CHARINDEX('<customValue>', dish.xml) - 13)) AS 'Dish',
SUBSTRING(outerEanCode.xml, (CHARINDEX('<outerEanCode>', outerEanCode.xml) + 14), (CHARINDEX('</outerEanCode>', outerEanCode.xml) - CHARINDEX('<outerEanCode>', outerEanCode.xml) - 14)) AS 'OuterEanCode'
FROM
[_fabbrica].[dbo].[entity] dish
JOIN
[_fabbrica].[dbo].[entity] outerEanCode ON outerEanCode.id = CASE WHEN CHARINDEX( '<outerEconomicActivityNomenclatureCode>', dish.xml ) > 0 THEN SUBSTRING( dish.xml, ( CHARINDEX( '<outerEconomicActivityNomenclatureCode>', dish.xml ) + 39 ), ( CHARINDEX( '</outerEconomicActivityNomenclatureCode>', dish.xml ) - CHARINDEX( '<outerEconomicActivityNomenclatureCode>', dish.xml ) - 39 ) ) ELSE '61D63FE7-212C-4847-BA32-1563D97E2424' END
WHERE
dish.type = 'Product'
AND SUBSTRING(dish.xml, (CHARINDEX('<num>', dish.xml) + 5), (CHARINDEX('</num>', dish.xml) - CHARINDEX('<num>', dish.xml) - 5)) IN ('94948475659909879621')
Are there some ways to change or create another query to reduce the execution time?
Upvotes: 3
Views: 747
Reputation: 453453
The m.c.value('(num/text())[1]', 'varchar(50)') in (...)
list seems to be optimised badly (execution plan link from comments) with each element in the list adding a sub tree to the plan. The concatenation operator at the top of all these is showing an elapsed time of 1min 28 seconds.
One way to avoid this would be to just evaluate m.c.value('(num/text())[1]', 'varchar(50)')
once per document and materializing the result to a temp table or table variable and then querying that with an IN
.
But the below seems to solve the issue without that
SELECT m.c.value('(name/customValue/text())[1]', 'varchar(50)') as dish
FROM _fabbrica.dbo.entity dish
CROSS APPLY (SELECT CAST(dish.xml as xml) as realxml) s
CROSS APPLY s.realxml.nodes('r[type = "DISH"][num =(
"46310660710877889975","94948475659909879621","94948475659909879704","94948474748856478602",
"46310660710877881223","94948475659909879363","94948474748847478248","26310360710077875944",
"94948475659909879207","94948475659909879700","94948475659909879650","94948474748856478612",
"94948474748856478618","94948475659909879704","94948474748856478496","46310660710877896883",
"46310660710877881223","94948475659909879363","94948474747747475446","94948475659909879366",
"94948475659909879121","26310360710077875994","94948475659909877377","94948475659909879123",
"94948475659909879124","94948475659909879122","94948474748847478246","94948475659909879549",
"94948474748856478594","94948474748847478244","94948475659909879597","94948475659909879363",
"94948474747747475446","94948475659909879366","94948475659909879121",
"94948475659909879123","94948475659909879122","46310660710877890446"
)]') m(c)
WHERE dish.type = 'Product'
You say in the comments that this has improved things and post the below execution plan. This still looks very slow to me. The elapsed time is 12.162 seconds even though the index seek to select the data was only 0.024 seconds. This still means that 99.8% of the execution time was taken up with parsing XML.
It looks to me that the actual casting of the string to XML is deferred and that this casting is being repeated every time the expression Expr1002
is referenced in the plan (by the red highlighted operators). In particular the CPU time spent by the filter which does nothing except have a start-up predicate on [Expr1002] IS NOT NULL
makes me think that.
If you are unable to change the source column to use XML datatype then you can do
SELECT CAST(xml AS xml) as realxml
INTO #Product
from _fabbrica.dbo.entity where type = 'Product'
And then change the query to operate against that temp table without any casting needed.
Upvotes: 3