Reputation: 1660
What is the simplest way to parse XML into columns with SQL Server. Example:
<ns0:root xmlns:ns0="http://herp...">
<ns1:FirstElement xmlns:ns1="http://derpyderp...">
<ns1:FirstElementID>AAF11303</ns1:FirstElementID>
<ns1:FirstEValue>some random text</ns1:FirstEValue>
<ns1:SecondElement>
<ns1:Something>Asdsad</ns1:Something>
<ns1:Else>
<ns1:Stuff>sdf</ns1:Stuff>
<ns1:StuffVal>15</ns1:StuffVal>
</ns1:Else>
<ns1:Else>
<ns1:Stuff>jarjar</ns1:Stuff>
<ns1:StuffVal>16</ns1:StuffVal>
<ns1:StuffParam>true</ns1:StuffParam>
</ns1:Else>
</ns1:SecondElement>
<ns1:randValue>dosd</ns1:randValue>
</ns1:FirstElement>
<ns1:FirstElement>
<ns1:FirstElementID>DDF00301</ns1:FirstElementID>
<ns1:FirstEValue/>
<ns1:SecondElement>
<ns1:Else>
<ns1:Stuff>yessir</ns1:Stuff>
<ns1:StuffVal>0</ns1:StuffVal>
</ns1:Else>
</ns1:SecondElement>
</ns1:FirstElement>
<!-- ... times n the first element with a variating amount of children up to 15 levels deep -->
</ns0:root>
I'd like this as a simple column output, ie:
FIRSTELEMENTID | FIRSTEVALUE | SOMETHING | ELSE.STUFF | ELSE.STUFFVAL | ELSE.STUFFPARAM | RANDVALUE
'AAF11303' |'some random text'| 'Asdasd' | 'sdf' | 15 | NULL | 'dosd'
'AAF11303' |'some random text'| 'Asdasd' | 'jarjar' | 16 | TRUE | 'dosd'
'DDF00301' | NULL | NULL | 'yessir' | 0 | NULL | NULL
Now, the actual XML is a lot more complex and all the examples I've seen so far involve manually parsing the XML and basically passing each child element as its own XML and parsing that separately in the next loop of the script etc etc etc.
Is there any way to just output the values flat out like that? Manually parsing several MB of XML with countless variations in which elements appear and how many times and where would make it a days long operation to do.
I've been thinking about doing a CLR assembly to just do this with C# and pass the result into SQL as a table, but I'd like to know if there's another way all the same.
Thanks!
Upvotes: 1
Views: 4015
Reputation: 117345
Basically, you can use nodes()
and values()
methods to parse your xml (and with xmlnamespaces
to handle namespaces). Something like this:
;with xmlnamespaces('http://herp...' as ns0, 'http://derpyderp...' as ns1)
select
T.C.value('(../../ns1:FirstElementID/text())[1]', 'nvarchar(max)') as FirstElementID,
T.C.value('(../../ns1:FirstEValue/text())[1]', 'nvarchar(max)') as FirstEValue,
T.C.value('(../ns1:Something/text())[1]', 'nvarchar(max)') as Something,
T.C.value('(ns1:Stuff/text())[1]', 'nvarchar(max)') as [Else.Stuff],
T.C.value('(ns1:StuffVal/text())[1]', 'nvarchar(max)') as [Else.StuffVal],
T.C.value('(ns1:StuffParam/text())[1]', 'nvarchar(max)') as [Else.StuffParam],
T.C.value('(../../ns1:randValue/text())[1]', 'nvarchar(max)') as [randValue]
from @data.nodes('ns0:root/ns1:FirstElement/ns1:SecondElement/ns1:Else') as T(C)
This one create one row for each ns0:root/ns1:FirstElement/ns1:SecondElement/ns1:Else
and then take all values you need (some of them taken from parent nodes). Note that if your First
FirstElement
does not contain any of the ns1:SecondElement/ns1:Else
nodes, it will not appear in the resultset. In that case, you may want to use a query like this:
;with xmlnamespaces('http://herp...' as ns0, 'http://derpyderp...' as ns1)
select
F.C.value('(ns1:FirstElementID/text())[1]', 'nvarchar(max)') as FirstElementID,
F.C.value('(ns1:FirstEValue/text())[1]', 'nvarchar(max)') as FirstEValue,
S.C.value('(ns1:Something/text())[1]', 'nvarchar(max)') as Something,
E.C.value('(ns1:Stuff/text())[1]', 'nvarchar(max)') as [Else.Stuff],
E.C.value('(ns1:StuffVal/text())[1]', 'nvarchar(max)') as [Else.StuffVal],
E.C.value('(ns1:StuffParam/text())[1]', 'nvarchar(max)') as [Else.StuffParam],
F.C.value('(ns1:randValue/text())[1]', 'nvarchar(max)') as [randValue]
from @data.nodes('ns0:root/ns1:FirstElement') as F(C)
outer apply F.C.nodes('ns1:SecondElement') as S(C)
outer apply S.C.nodes('ns1:Else') as E(C)
Upvotes: 3