jmucchiello
jmucchiello

Reputation: 18984

Parse XML into a (XPath, Value) pair

Working with XML in SQL Server, given this XML:

    <A>
        <B>123</B>
        <C>
              <Cs>234</Cs>
              <Cs>345</Cs>
              <Cs>12</Cs>
              <Cs>2346</Cs>
        </Cs>
    </A>

I'd like to produce a result set that looks like this:

xpath value
(/A/B)[1] 123
(/A/C/Cs)[1] 234
(/A/C/Cs)[2] 345
(/A/C/Cs)[3] 12
(/A/C/Cs)[4] 2346

Is there a trick that can do this without walking through the XML? Added bonus would include the ability to start somewhere other than the document root. You could pass /A/C to this routine and it would only give the paths under that element.

Upvotes: 1

Views: 384

Answers (2)

Charlieface
Charlieface

Reputation: 71578

You can use a recursive CTE. You pass in the XML document in @xml. If you need to use a table to get the XML, you can use CROSS APPLY YourXml.nodes instead of FROM @XML.nodes.

WITH cte AS (
    SELECT
      xpath = CONCAT(v.name, '[', ROW_NUMBER() OVER (PARTITION BY v.name ORDER BY (SELECT 1)), ']'),
      value = x.nd.value('text()[1]','nvarchar(100)'),
      child = x.nd.query('*')
    FROM @xml.nodes('*') x(nd)
    CROSS APPLY (VALUES (x.nd.value('local-name(.)[1]','nvarchar(max)'))) v(name)
    UNION ALL
    SELECT
      xpath = CONCAT(cte.xpath, '/', v.name, '[', ROW_NUMBER() OVER (PARTITION BY xpath, v.name ORDER BY (SELECT 1)), ']'),
      value = x.nd.value('text()[1]','nvarchar(100)'),
      child = x.nd.query('*')
    FROM cte
    CROSS APPLY cte.child.nodes('*') x(nd)
    CROSS APPLY (VALUES (x.nd.value('local-name(.)[1]','nvarchar(max)'))) v(name)
)
SELECT
  xpath = CONCAT('/', xpath, '/text()[1]'),
  value
FROM cte
WHERE value IS NOT NULL;

db<>fiddle

Unfortunately, you cannot use the ancestor:: axis, which would have made this much easier.

If SQL Server supported ancestor:: you could do something like this

SELECT
  xpath = '(' + x.nd.query('for $n in ancestor::* return concat("/", local-name($n))') + '/text())[1]',
  value = x.nd.value('text()[1]','nvarchar(100)')
FROM @xml.nodes('//*[text()]') x(nd)

Upvotes: 0

Yitzhak Khabinsky
Yitzhak Khabinsky

Reputation: 22187

This is one of the rare cases when the archaic OPENXML() is handy.

XQuery 3.0 introduced a real solution for such task: fn:path() function long time ago in 2014. Unfortunately, MS SQL Server supports just a subset of XQuery 1.0

Back to mundane earth.

SQL

DECLARE @xml XML =
N'<A>
    <B>123</B>
    <C>
            <Cs>234</Cs>
            <Cs>345</Cs>
            <Cs>12</Cs>
            <Cs>2346</Cs>
    </C>
</A>';

DECLARE @DocHandle INT;
EXEC sp_xml_preparedocument @DocHandle OUTPUT, @xml;

;WITH rs AS
(
   SELECT * FROM OPENXML(@DocHandle,'/*')
), cte AS
(
   -- anchor
    SELECT id
      ,ParentID
      --, nodetype
      , [text]
      ,CAST(id AS VARCHAR(100)) AS [Path]
      ,CAST('/' + rs.localname AS VARCHAR(1000)) 
      + N'[' 
        + CAST(ROW_NUMBER() OVER(ORDER BY (SELECT 1)) AS NVARCHAR)  
        + N']' AS [XPath]
    FROM rs
    WHERE ParentID IS NULL
    UNION ALL
    --recursive member
    SELECT t.id
      ,t.ParentID
      --, nodetype = (SELECT nodetype FROM rs WHERE id = t.ParentID)
      , t.[text]
      , CAST(a.[Path] + ',' + CAST( t.ID AS VARCHAR(100)) AS VARCHAR(100)) AS [Path]
      , CAST(a.[XPath] + '/' + IIF(t.nodetype = 2, '@', '')
         + t.localname AS VARCHAR(1000)) 
         + N'['
        + TRY_CAST(ROW_NUMBER() OVER(PARTITION BY t.localname ORDER BY (SELECT 1)) AS NVARCHAR)
        + N']' AS [XPath]
    FROM rs AS t
      INNER JOIN cte AS a ON t.ParentId = a.id
)
SELECT ID, ParentID, /*nodetype,*/ [Path]
   , REPLACE([XPath],'#text','text()') AS XPath
   , [text] AS [Value]
FROM cte
WHERE [text] IS NOT NULL
    --AND CAST([text] AS VARCHAR(30)) = '12345'
ORDER BY [Path];

EXEC sp_xml_removedocument @DocHandle;

Output

+----+----------+----------+----------------------------+-------+
| ID | ParentID |   Path   |           XPath            | Value |
+----+----------+----------+----------------------------+-------+
|  8 |        2 |    0,2,8 | /A[1]/B[1]/text()[1]       |   123 |
|  9 |        4 |  0,3,4,9 | /A[1]/C[1]/Cs[1]/text()[1] |   234 |
| 10 |        5 | 0,3,5,10 | /A[1]/C[1]/Cs[2]/text()[1] |   345 |
| 11 |        6 | 0,3,6,11 | /A[1]/C[1]/Cs[3]/text()[1] |    12 |
| 12 |        7 | 0,3,7,12 | /A[1]/C[1]/Cs[4]/text()[1] |  2346 |
+----+----------+----------+----------------------------+-------+

Upvotes: 1

Related Questions