Reputation: 289
I'm trying to get a distinct list of node names from an XML file. I've been successful with a recursive CTE similar to https://stackoverflow.com/a/2274091/1735928, but only with my files below about 1m characters. Above that, the query never seems to return. Some of my files are in the neighborhood of 100m characters.
I've since switched to trying PowerShell. For this samle XML:
<?xml version="1.0" encoding="UTF_8"?>
<root>
<childA>
<descendant1>
<descendant1_1>
<descendant1_1_1>1111111111</descendant1_1_1>
</descendant1_1>
</descendant1>
<descendant2>0</descendant2>
</childA>
<childA>
<descendant1>
<descendant1_1>
<descendant1_1_1>2222222222</descendant1_1_1>
</descendant1_1>
</descendant1>
<descendant2>2</descendant2>
</childA>
<childB>
<descendant1>
<descendant1_1>
<descendant1_1_1>2222222222</descendant1_1_1>
</descendant1_1>
</descendant1>
<descendant3>0</descendant3>
</childB>
<childC>
<descendant4>0</descendant4>
</childC>
<childC>
<descendant4>6</descendant4>
</childC>
</root>
I've gotten as far as:
$xml.childnodes[1].childnodes | select -uniq | foreach { $xml.childnodes[1].($_.name).childnodes.name | select -uniq }
which gives me:
descendant1
descendant2
descendant1
descendant3
descendant4
but that doesn't include the further descendants. Ultimately, I'm trying to get a table back to SQL that looks like this:
root | childA | descendant1
root | childA | descendant1_1
root | childA | descendant1_1_1
root | childA | descendant2
root | childB | descendant1
root | childB | descendant1_1
root | childB | descendant1_1_1
root | childB | descendant3
root | childC | descendant4
Upvotes: 1
Views: 922
Reputation: 2355
Below is a naive solution, assuming you know the depth of your xml. But probably you can do entire thing with xqury and thus do it on SQL side
[xml]$x = "your xml here"
# ------ LEVEL 2 children
$L2 = $x | Select-Xml "//root/*/*"
foreach($n in $L2) {
$L1 = $n.node.ParentNode.LocalName
$CHILD = $n.node.localname
[PSCustomObject]@{L1=$L1; CHILD = $CHILD}
}
# ------ LEVEL 3 children
$L3 = $x | Select-Xml "//root/*/*/*"
foreach($n in $L3) {
$L1 = $n.node.ParentNode.ParentNode.LocalName
$CHILD = $n.node.localname
[PSCustomObject]@{L1=$L1; CHILD = $CHILD}
}
# ------ LEVEL 4 children
$L4 = $x | Select-Xml "//root/*/*/*/*"
foreach($n in $L4) {
$L1 = $n.node.ParentNode.ParentNode.ParentNode.LocalName
$CHILD = $n.node.localname
[PSCustomObject]@{L1=$L1; CHILD = $CHILD}
}
Also adding sql xquery version. It still requires knowledge of the structure and only do one level at the time, but it doesnt have cross apply/joins so probably will work better on huge files
select
T.c.query('local-name(.)') as self
,T.c.query('local-name(..)') as parent
,T.c.query('local-name(../..)') as Gparent
,T.c.query('local-name(../../..)') as GGparent
from @x.nodes('/root/*/*/*/*') T(c)
Upvotes: 1
Reputation: 3429
If we are talking about hard-coded solutions to this problem, then here is my solution using SQL Server.
DECLARE @x XML = '
<root>
<childA>
<descendant1>
<descendant1_1>
<descendant1_1_1>1111111111</descendant1_1_1>
</descendant1_1>
</descendant1>
<descendant2>0</descendant2>
</childA>
<childA>
<descendant1>
<descendant1_1>
<descendant1_1_1>2222222222</descendant1_1_1>
</descendant1_1>
</descendant1>
<descendant2>2</descendant2>
</childA>
<childB>
<descendant1>
<descendant1_1>
<descendant1_1_1>2222222222</descendant1_1_1>
</descendant1_1>
</descendant1>
<descendant3>0</descendant3>
</childB>
<childC>
<descendant4>0</descendant4>
</childC>
<childC>
<descendant4>6</descendant4>
</childC>
</root>
';
SELECT
x.n.value('fn:local-name(.)', 'NVARCHAR(MAX)') root,
L2.n.value('fn:local-name(.)', 'NVARCHAR(MAX)') L2,
L3.n.value('fn:local-name(.)', 'NVARCHAR(MAX)') L3,
L4.n.value('fn:local-name(.)', 'NVARCHAR(MAX)') L4,
L5.n.value('fn:local-name(.)', 'NVARCHAR(MAX)') L5
FROM @x.nodes('/*') x(n)
OUTER APPLY x.n.nodes('*') L2(n)
OUTER APPLY L2.n.nodes('*') L3(n)
OUTER APPLY L3.n.nodes('*') L4(n)
OUTER APPLY L4.n.nodes('*') L5(n);
Output
+------+--------+-------------+---------------+-----------------+
| root | L2 | L3 | L4 | L5 |
+------+--------+-------------+---------------+-----------------+
| root | childA | descendant1 | descendant1_1 | descendant1_1_1 |
| root | childA | descendant2 | | |
| root | childA | descendant1 | descendant1_1 | descendant1_1_1 |
| root | childA | descendant2 | | |
| root | childB | descendant1 | descendant1_1 | descendant1_1_1 |
| root | childB | descendant3 | | |
| root | childC | descendant4 | | |
| root | childC | descendant4 | | |
+------+--------+-------------+---------------+-----------------+
Upvotes: 1