Myrkytyn
Myrkytyn

Reputation: 91

Slow SQL query to get values from xml where some value is in xml

I want to make my SQL execution faster.

I have the table with a XML column. XML stored is in this format:

<r eid="bdcd3ba2-e370-4140-8c46-01aed584bab2">
    <name>
        <customValue>Aperol</customValue>
    </name>
    <description></description>
    <num>46310660710877889975</num>
    <outerEconomicActivityNomenclatureCode>ccbdd540-3bb4-4a9f-a8d3-5a31b9459846</outerEconomicActivityNomenclatureCode>
    <type>DISH</type>
</r>

I want to get <outerEconomicActivityNomenclatureCode> value from XML row, where <type> in XML is DISH and <num> is in ('46310660710877889975', '46310660710877889976', '46310660710877889975', '46310660710877889975')

My SQL queries:

this takes 1 minute and 44 seconds

SELECT
    m.c.value('(name/customValue/text())[1]', 'varchar(50)') AS dish
FROM 
    _fabbrica.dbo.entity dish
CROSS APPLY 
    (SELECT CAST(dish.xml AS xml) AS realxml) s
CROSS APPLY
    s.realxml.nodes('r') m(c)
WHERE
    dish.type = 'Product' 
    AND m.c.value('(type/text())[1]', 'varchar(50)') = 'DISH' 
    AND m.c.value('(num/text())[1]', 'varchar(50)') IN ('94948475659909879621', 'code', 'code')

This query takes 2 minutes

SELECT 
    dish.id, 
    SUBSTRING(dish.xml, (CHARINDEX('<customValue>', dish.xml) + 13), (CHARINDEX('</customValue>', dish.xml) - CHARINDEX('<customValue>', dish.xml) - 13)) AS 'Dish', 
    SUBSTRING(outerEanCode.xml, (CHARINDEX('<outerEanCode>', outerEanCode.xml) + 14), (CHARINDEX('</outerEanCode>', outerEanCode.xml) - CHARINDEX('<outerEanCode>', outerEanCode.xml) - 14)) AS 'OuterEanCode'
FROM 
    [_fabbrica].[dbo].[entity] dish 
JOIN 
    [_fabbrica].[dbo].[entity] outerEanCode ON outerEanCode.id = CASE WHEN CHARINDEX( '<outerEconomicActivityNomenclatureCode>', dish.xml ) > 0 THEN SUBSTRING( dish.xml, ( CHARINDEX( '<outerEconomicActivityNomenclatureCode>', dish.xml ) + 39 ), ( CHARINDEX( '</outerEconomicActivityNomenclatureCode>', dish.xml ) - CHARINDEX( '<outerEconomicActivityNomenclatureCode>', dish.xml ) - 39 ) ) ELSE '61D63FE7-212C-4847-BA32-1563D97E2424' END
WHERE 
    dish.type = 'Product' 
    AND SUBSTRING(dish.xml, (CHARINDEX('<num>', dish.xml) + 5), (CHARINDEX('</num>', dish.xml) - CHARINDEX('<num>', dish.xml) - 5)) IN ('94948475659909879621')

Are there some ways to change or create another query to reduce the execution time?

Upvotes: 3

Views: 747

Answers (1)

Martin Smith
Martin Smith

Reputation: 453453

The m.c.value('(num/text())[1]', 'varchar(50)') in (...) list seems to be optimised badly (execution plan link from comments) with each element in the list adding a sub tree to the plan. The concatenation operator at the top of all these is showing an elapsed time of 1min 28 seconds.

enter image description here

One way to avoid this would be to just evaluate m.c.value('(num/text())[1]', 'varchar(50)') once per document and materializing the result to a temp table or table variable and then querying that with an IN.

But the below seems to solve the issue without that

SELECT m.c.value('(name/customValue/text())[1]', 'varchar(50)') as dish
FROM _fabbrica.dbo.entity dish
CROSS APPLY (SELECT CAST(dish.xml as xml) as realxml) s
CROSS APPLY s.realxml.nodes('r[type = "DISH"][num =(
    "46310660710877889975","94948475659909879621","94948475659909879704","94948474748856478602",
    "46310660710877881223","94948475659909879363","94948474748847478248","26310360710077875944",
    "94948475659909879207","94948475659909879700","94948475659909879650","94948474748856478612",
    "94948474748856478618","94948475659909879704","94948474748856478496","46310660710877896883",
    "46310660710877881223","94948475659909879363","94948474747747475446","94948475659909879366",
    "94948475659909879121","26310360710077875994","94948475659909877377","94948475659909879123",
    "94948475659909879124","94948475659909879122","94948474748847478246","94948475659909879549",
    "94948474748856478594","94948474748847478244","94948475659909879597","94948475659909879363",
    "94948474747747475446","94948475659909879366","94948475659909879121",
    "94948475659909879123","94948475659909879122","46310660710877890446"
)]') m(c)
WHERE dish.type = 'Product'

You say in the comments that this has improved things and post the below execution plan. This still looks very slow to me. The elapsed time is 12.162 seconds even though the index seek to select the data was only 0.024 seconds. This still means that 99.8% of the execution time was taken up with parsing XML.

enter image description here

It looks to me that the actual casting of the string to XML is deferred and that this casting is being repeated every time the expression Expr1002 is referenced in the plan (by the red highlighted operators). In particular the CPU time spent by the filter which does nothing except have a start-up predicate on [Expr1002] IS NOT NULL makes me think that.

If you are unable to change the source column to use XML datatype then you can do

SELECT CAST(xml AS xml) as realxml 
INTO #Product 
from _fabbrica.dbo.entity where type = 'Product' 

And then change the query to operate against that temp table without any casting needed.

Upvotes: 3

Related Questions