Jeremy
Jeremy

Reputation: 46320

XML Server XML performance optimization

I've got 34 rows in a database, each row has a column containing xml - the xml is actually in an NVARCHAR(MAX) column not an XML column.

For each row I am selecting values in the xml elements as a single resultset. The performance is pretty poor. I've tried two different queries. The first takes roughly 22 seconds to execute and the second takes 7.

Even at 7 seconds, this is far slower than optimal, I'm hoping for 1-2 seconds at most.

So then I read a rumor online that if you convert the NVARCHAR data to a XML using a temp table or table variable, you will achieve a performance gain, which at least in my case was true... It now executes in under a second. What I'm looking for now is an explanation that can tell my why these 2 approaches actually affect performance.

22 seconds:

SELECT
    c.ID,
    c.ChannelName,
    [Name] = d.c.value('name[1]','varchar(100)'),
    [Type] = d.c.value('transportName[1]','varchar(100)'),
    [Enabled] = d.c.value('enabled[1]','BIT'),
    [Queued] = d.c.value('properties[1]/destinationConnectorProperties[1]/queueEnabled[1]','varchar(100)'),
    [RetryInterval] = d.c.value('properties[1]/destinationConnectorProperties[1]/retryIntervalMillis[1]','INT'),
    [MaxRetries] = d.c.value('properties[1]/destinationConnectorProperties[1]/retryCount[1]','INT'),
    [RotateQueue] = d.c.value('properties[1]/destinationConnectorProperties[1]/rotate[1]','BIT'),
    [ThreadCount] = d.c.value('properties[1]/destinationConnectorProperties[1]/threadCount[1]','INT'),
    [WaitForPrevious] = d.c.value('waitForPrevious[1]','BIT'),
    [Destination] = COALESCE(
        d.c.value('properties[1]/channelId[1]','varchar(100)'),
        d.c.value('properties[1]/remoteAddress[1]','varchar(100)'),
        d.c.value('properties[1]/wsdlUrl[1]','varchar(1024)')),

    [DestinationPort] = COALESCE(
        d.c.value('properties[1]/remotePort[1]','varchar(100)'),
        d.c.value('properties[1]/port[1]','varchar(1024)')),
    [Service] = d.c.value('properties[1]/service[1]','varchar(1024)'),
    [Operation] = d.c.value('properties[1]/operation[1]','varchar(1024)')
FROM
(
    SELECT
            [ID],
            [ChannelName] = [Name],
            [CFG] = Convert(XML, Channel)
    FROM
            dbo.CHANNEL
) c
CROSS APPLY c.CFG.nodes('/channel/destinationConnectors/connector') d(c)

7 seconds, due to use of text(). I have no idea why text speeds things up.

SELECT
    c.ID,
    c.ChannelName,
    [Name] = d.c.value('(name/text())[1]','varchar(100)'),
    [Type] = d.c.value('(transportName/text())[1]','varchar(100)'),
    [Enabled] = d.c.value('(enabled/text())[1]','BIT'),
    [Queued] = d.c.value('(properties/destinationConnectorProperties/queueEnabled/text())[1]','varchar(100)'),
    [RetryInterval] = d.c.value('(properties/destinationConnectorProperties/retryIntervalMillis/text())[1]','INT'),
    [MaxRetries] = d.c.value('(properties/destinationConnectorProperties/retryCount/text())[1]','INT'),
    [RotateQueue] = d.c.value('(properties/destinationConnectorProperties/rotate/text())[1]','BIT'),
    [ThreadCount] = d.c.value('(properties/destinationConnectorProperties/threadCount/text())[1]','INT'),
    [WaitForPrevious] = d.c.value('(waitForPrevious/text())[1]','BIT'),
    [Destination] = COALESCE(
        d.c.value('(properties/channelId/text())[1]','varchar(100)'),
        d.c.value('(properties/remoteAddress/text())[1]','varchar(100)'),
        d.c.value('(properties/wsdlUrl/text())[1]','varchar(1024)')),

    [DestinationPort] = COALESCE(
        d.c.value('(properties/remotePort/text())[1]','varchar(100)'),
        d.c.value('(properties/port/text())[1]','varchar(1024)')),
    [Service] = d.c.value('(properties/service/text())[1]','varchar(1024)'),
    [Operation] = d.c.value('(properties/operation/text())[1]','varchar(1024)')
FROM
(
    SELECT
            [ID],
            [ChannelName] = [Name],
            [CFG] = Convert(XML, Channel)
    FROM
            dbo.CHANNEL
) c
CROSS APPLY c.CFG.nodes('/channel/destinationConnectors/connector') d(c)

This query uses the text() approach but puts converts the NVARCHAR column to xml column in a table variable first. Executes in less than a second...

DECLARE @Xml AS TABLE (
    [ID] NVARCHAR(36) NOT NULL Primary Key,
    [Name] NVARCHAR(100) NOT NULL,
    [CFG] XML NOT NULL
);

INSERT INTO @Xml (ID, Name, CFG)
SELECT
    c.ID,
    c.Name,
    Convert(XML, c.Channel)
FROM
    [dbo].[CHANNEL] c;

SELECT
    c.ID,
    c.ChannelName,
    [Name] = d.c.value('(name/text())[1]','varchar(100)'),
    [Type] = d.c.value('(transportName/text())[1]','varchar(100)'),
    [Enabled] = d.c.value('(enabled/text())[1]','BIT'),
    [Queued] = d.c.value('(properties/destinationConnectorProperties/queueEnabled/text())[1]','varchar(100)'),
    [RetryInterval] = d.c.value('(properties/destinationConnectorProperties/retryIntervalMillis/text())[1]','INT'),
    [MaxRetries] = d.c.value('(properties/destinationConnectorProperties/retryCount/text())[1]','INT'),
    [RotateQueue] = d.c.value('(properties/destinationConnectorProperties/rotate/text())[1]','BIT'),
    [ThreadCount] = d.c.value('(properties/destinationConnectorProperties/threadCount/text())[1]','INT'),
    [WaitForPrevious] = d.c.value('(waitForPrevious/text())[1]','BIT'),
    [Destination] = COALESCE(
        d.c.value('(properties/channelId/text())[1]','varchar(100)'),
        d.c.value('(properties/remoteAddress/text())[1]','varchar(100)'),
        d.c.value('(properties/wsdlUrl/text())[1]','varchar(1024)')),

    [DestinationPort] = COALESCE(
        d.c.value('(properties/remotePort/text())[1]','varchar(100)'),
        d.c.value('(properties/port/text())[1]','varchar(1024)')),
    [Service] = d.c.value('(properties/service/text())[1]','varchar(1024)'),
    [Operation] = d.c.value('(properties/operation/text())[1]','varchar(1024)')
FROM
(
    SELECT
            [ID],
            [ChannelName] = [Name],
            [CFG]
    FROM
            @Xml
) c
CROSS APPLY c.CFG.nodes('/channel/destinationConnectors/connector') d(c)

Upvotes: 2

Views: 620

Answers (1)

Gottfried Lesigang
Gottfried Lesigang

Reputation: 67291

I can give you one answer and one guess:

First I use a declared table variable to mock up your scenario:

DECLARE @tbl TABLE(s NVARCHAR(MAX));
INSERT INTO @tbl VALUES
(N'<root>
    <SomeElement>This is first text of element1
        <InnerElement>This is text of inner element1</InnerElement>
        This is second text of element1
    </SomeElement>
    <SomeElement>This is first text of element2
        <InnerElement>This is text of inner element2</InnerElement>
        This is second text of element2
    </SomeElement>
</root>')
,(N'<root>
    <SomeElement>This is first text of elementA
        <InnerElement>This is text of inner elementA</InnerElement>
        This is second text of elementA
    </SomeElement>
    <SomeElement>This is first text of elementB
        <InnerElement>This is text of inner elementB</InnerElement>
        This is second text of elementB
    </SomeElement>
</root>');

--This query will read the XML with a cast out of a sub-select. You might use a CTE instead, but this should be syntactical sugar only...

SELECT se.value(N'(.)[1]','nvarchar(max)') SomeElementsContent
      ,se.value(N'(InnerElement)[1]','nvarchar(max)') InnerElementsContent
      ,se.value(N'(./text())[1]','nvarchar(max)') ElementsFirstText
      ,se.value(N'(./text())[2]','nvarchar(max)') ElementsSecondText
FROM (SELECT CAST(s AS XML) FROM @tbl) AS tbl(TheXml)
CROSS APPLY TheXml.nodes(N'/root/SomeElement') AS A(se);

--The second part uses a table to write in the typed XML and read from there:

DECLARE @tbl2 TABLE(x XML)
INSERT INTO @tbl2
SELECT CAST(s AS XML) FROM @tbl;

SELECT se.value(N'(.)[1]','nvarchar(max)') SomeElementsContent
      ,se.value(N'(InnerElement)[1]','nvarchar(max)') InnerElementsContent
      ,se.value(N'(./text())[1]','nvarchar(max)') ElementsFirstText
      ,se.value(N'(./text())[2]','nvarchar(max)') ElementsSecondText
FROM @tbl2 t2
CROSS APPLY t2.x.nodes(N'/root/SomeElement') AS A(se);

Why is /text() faster than without /text()?

If you look at my example, the content of an element is everything from the opening tag down to the closing tag. The text() of an element is the floating text between these tags. You can see this in the results of the select above. The text() is one separately stored portion in a tree structure actually (read next section). To fetch it, is a one-step-action. Otherwise a complex structure has to be analysed to find everything between the opening tag and its corresponding closing tag - even if there is nothing else than the text().

Why should I store XML in the appropriate type?

XML is not just text with some silly extra characters! It is a document with a complex structure. The XML is not stored as the text you see. XML is stored in a tree structure. Whenever you cast a string, which represents an XML, into a real XML, this very expensive work must be done. When the XML is presented to you (or any other output) the representing string is (re)built from scratch.

Why is the pre-casted approach faster

This is guessing...
In my example both approaches are quite equal and lead to (almost) the same execution plan.
SQL Server will not work down everything the way you might expect this. This is not a procedural system where you state do this, than do this and after do this!. You tell the engine what you want, and the engine decides how to do this best. And the engine is pretty good with this!
Before execution starts, the engine tries to estimate the costs of approaches. CONVERT (or CAST) is a rather cheap operation. It could be, that the engine decides to work down the list of your calls and do the cast for each single need over and over, because it thinks, that this is cheaper than the expensive creation of a derived table...

Upvotes: 2

Related Questions