Kahn
Kahn

Reputation: 1660

Parsing XML to Columns using SQL Server

What is the simplest way to parse XML into columns with SQL Server. Example:

<ns0:root xmlns:ns0="http://herp...">
    <ns1:FirstElement xmlns:ns1="http://derpyderp...">
        <ns1:FirstElementID>AAF11303</ns1:FirstElementID>
        <ns1:FirstEValue>some random text</ns1:FirstEValue>
        <ns1:SecondElement>
            <ns1:Something>Asdsad</ns1:Something>
            <ns1:Else>
                <ns1:Stuff>sdf</ns1:Stuff>
                <ns1:StuffVal>15</ns1:StuffVal>
            </ns1:Else>
            <ns1:Else>
                <ns1:Stuff>jarjar</ns1:Stuff>
                <ns1:StuffVal>16</ns1:StuffVal>
                <ns1:StuffParam>true</ns1:StuffParam>
            </ns1:Else>
        </ns1:SecondElement>
        <ns1:randValue>dosd</ns1:randValue>
    </ns1:FirstElement>
    <ns1:FirstElement>
        <ns1:FirstElementID>DDF00301</ns1:FirstElementID>
        <ns1:FirstEValue/>
        <ns1:SecondElement>
            <ns1:Else>
                <ns1:Stuff>yessir</ns1:Stuff>
                <ns1:StuffVal>0</ns1:StuffVal>
            </ns1:Else>
        </ns1:SecondElement>        
    </ns1:FirstElement>
    <!-- ... times n the first element with a variating amount of children up to 15 levels deep -->
</ns0:root>

I'd like this as a simple column output, ie:

FIRSTELEMENTID | FIRSTEVALUE      | SOMETHING | ELSE.STUFF | ELSE.STUFFVAL | ELSE.STUFFPARAM | RANDVALUE
'AAF11303'     |'some random text'| 'Asdasd'  | 'sdf'      | 15            | NULL            | 'dosd'
'AAF11303'     |'some random text'| 'Asdasd'  | 'jarjar'   | 16            | TRUE            | 'dosd'
'DDF00301'     | NULL             | NULL      | 'yessir'   | 0             | NULL            | NULL

Now, the actual XML is a lot more complex and all the examples I've seen so far involve manually parsing the XML and basically passing each child element as its own XML and parsing that separately in the next loop of the script etc etc etc.

Is there any way to just output the values flat out like that? Manually parsing several MB of XML with countless variations in which elements appear and how many times and where would make it a days long operation to do.

I've been thinking about doing a CLR assembly to just do this with C# and pass the result into SQL as a table, but I'd like to know if there's another way all the same.

Thanks!

Upvotes: 1

Views: 4015

Answers (1)

roman
roman

Reputation: 117345

Basically, you can use nodes() and values() methods to parse your xml (and with xmlnamespaces to handle namespaces). Something like this:

;with xmlnamespaces('http://herp...' as ns0, 'http://derpyderp...' as ns1)
select
    T.C.value('(../../ns1:FirstElementID/text())[1]', 'nvarchar(max)') as FirstElementID,
    T.C.value('(../../ns1:FirstEValue/text())[1]', 'nvarchar(max)') as FirstEValue,
    T.C.value('(../ns1:Something/text())[1]', 'nvarchar(max)') as Something,
    T.C.value('(ns1:Stuff/text())[1]', 'nvarchar(max)') as [Else.Stuff],
    T.C.value('(ns1:StuffVal/text())[1]', 'nvarchar(max)') as [Else.StuffVal],
    T.C.value('(ns1:StuffParam/text())[1]', 'nvarchar(max)') as [Else.StuffParam],
    T.C.value('(../../ns1:randValue/text())[1]', 'nvarchar(max)') as [randValue]
from @data.nodes('ns0:root/ns1:FirstElement/ns1:SecondElement/ns1:Else') as T(C)

This one create one row for each ns0:root/ns1:FirstElement/ns1:SecondElement/ns1:Else and then take all values you need (some of them taken from parent nodes). Note that if your First FirstElement does not contain any of the ns1:SecondElement/ns1:Else nodes, it will not appear in the resultset. In that case, you may want to use a query like this:

;with xmlnamespaces('http://herp...' as ns0, 'http://derpyderp...' as ns1)
select
    F.C.value('(ns1:FirstElementID/text())[1]', 'nvarchar(max)') as FirstElementID,
    F.C.value('(ns1:FirstEValue/text())[1]', 'nvarchar(max)') as FirstEValue,
    S.C.value('(ns1:Something/text())[1]', 'nvarchar(max)') as Something,
    E.C.value('(ns1:Stuff/text())[1]', 'nvarchar(max)') as [Else.Stuff],
    E.C.value('(ns1:StuffVal/text())[1]', 'nvarchar(max)') as [Else.StuffVal],
    E.C.value('(ns1:StuffParam/text())[1]', 'nvarchar(max)') as [Else.StuffParam],
    F.C.value('(ns1:randValue/text())[1]', 'nvarchar(max)') as [randValue]
from @data.nodes('ns0:root/ns1:FirstElement') as F(C)
    outer apply F.C.nodes('ns1:SecondElement') as S(C)
    outer apply S.C.nodes('ns1:Else') as E(C)

sql fiddle demo

Upvotes: 3

Related Questions