XML query very slow, when .value method on XML type contains attribute filter

Question

I have a single XML value. It contains quite a lot of XML, about 13 MB. The file is available publicly at https://www.profinfo.pl/export/bookseller.xml In fact it contains an invalid character, but I remove all such characters before passing it to SQL Server.

The file contains the following XML structure:

…

Now, a want to execute following query in SQL:

SELECT
    product.value('productId[1]', 'VARCHAR(50)') AS id,
    product.value('(productAttributes/productAttribute[@name="Format"])[1]', 'NVARCHAR(255)') AS format
FROM @xml.nodes('/products/product') p(product)

The problem is, that the query is extremely slow. I found the way to overcome this limitation. I can rewrite this query extracting productAttributes node as a separate XML and query only this part:

SELECT
    product.value('productId[1]', 'VARCHAR(50)') AS id,
    attrib.value('productAttribute[@name="Format"][1]', 'NVARCHAR(255)') AS format
FROM @xml.nodes('/products/product') p(product)
    OUTER APPLY (SELECT product.query('productAttributes/productAttribute') AS attrib) a

but still I don't understand why the first query has such low performance.

For testing I limited the size of the returned data to the TOP 100 rows and I tried to compare the plans of both queries.

The most suspicious fragment is the XML Reader with the XPath filter, which returns 3139900 rows.

This is equal to 1847 (number of product elements in whole file) times 17 (number of productAttribute elements in each product) times 100 (number of top returned rows). Without limiting this function will return 1847 * 1847 * 17 = 57993953, which I think explains the low performance. It grows quadratically with the number of products in te XML file.

Interestingly, the complexity of the first, much slower query is estimated at 31% of the batch, second at 69%.

Does anyone know why filtering without an additional .query method is so expensive?

XML query very slow, when .value method on XML type contains attribute filter

Answers (1)

Related Questions