katit
katit

Reputation: 17915

T-SQL Parse XML with identical nodes on same row

I need to parse following XML: https://www.iftach.org/taxmatrix/charts/4Q2016.xml

Each of those RECORD nodes have to be in separate row:

<RECORD>
        <JURISDICTION ID="#15"  >AB</JURISDICTION>
        <COUNTRY>CAN</COUNTRY>
        <FUEL_TYPE>Gasoline</FUEL_TYPE>
        <RATE COUNTRY="US" RATECHANGE="0">0.3734</RATE>
        <RATE COUNTRY="CAN" RATECHANGE="0">0.1300</RATE>
        <FUEL_TYPE>Special Diesel</FUEL_TYPE>
        <RATE COUNTRY="US" RATECHANGE="0">0.3734</RATE>
        <RATE COUNTRY="CAN" RATECHANGE="0">0.1300</RATE>
        <FUEL_TYPE>Gasohol</FUEL_TYPE>
        <RATE COUNTRY="US" RATECHANGE="0">0.3734</RATE>
        <RATE COUNTRY="CAN" RATECHANGE="0">0.1300</RATE>
        <FUEL_TYPE>Propane</FUEL_TYPE>
        <RATE COUNTRY="US" RATECHANGE="0">0.2700</RATE>
        <RATE COUNTRY="CAN" RATECHANGE="0">0.0940</RATE>
        <FUEL_TYPE>LNG</FUEL_TYPE>
        <RATE COUNTRY="US" RATECHANGE="0">0.0000</RATE>
        <RATE COUNTRY="CAN" RATECHANGE="0">0.0000</RATE>
        <FUEL_TYPE>CNG</FUEL_TYPE>
        <RATE COUNTRY="US" RATECHANGE="0">0.0000</RATE>
        <RATE COUNTRY="CAN" RATECHANGE="0">0.0000</RATE>
        <FUEL_TYPE>Ethanol</FUEL_TYPE>
        <RATE COUNTRY="US" RATECHANGE="0">0.3734</RATE>
        <RATE COUNTRY="CAN" RATECHANGE="0">0.1300</RATE>
        <FUEL_TYPE>Methanol</FUEL_TYPE>
        <RATE COUNTRY="US" RATECHANGE="0">0.3734</RATE>
        <RATE COUNTRY="CAN" RATECHANGE="0">0.1300</RATE>
        <FUEL_TYPE>E-85</FUEL_TYPE>
        <RATE COUNTRY="US" RATECHANGE="0">0.3734</RATE>
        <RATE COUNTRY="CAN" RATECHANGE="0">0.1300</RATE>
        <FUEL_TYPE>M-85</FUEL_TYPE>
        <RATE COUNTRY="US" RATECHANGE="0">0.3734</RATE>
        <RATE COUNTRY="CAN" RATECHANGE="0">0.1300</RATE>
        <FUEL_TYPE>A55</FUEL_TYPE>
        <RATE COUNTRY="US" RATECHANGE="0">0.3734</RATE>
        <RATE COUNTRY="CAN" RATECHANGE="0">0.1300</RATE>
        <FUEL_TYPE>Biodiesel</FUEL_TYPE>
        <RATE COUNTRY="US" RATECHANGE="0">0.3734</RATE>
        <RATE COUNTRY="CAN" RATECHANGE="0">0.1300</RATE>
    </RECORD>

I want following columns in my set:

Is this even possible in T_SQL? In C# I would just read XML top to bottom and connect TYPE to RATE based on position in XML. It's more convenient for me to have this in SQL though..

Upvotes: 2

Views: 43

Answers (2)

Gottfried Lesigang
Gottfried Lesigang

Reputation: 67311

Here is another solution, which might be a bit easier. The idea is roughly the same as in Dan's approach (which is great!).

The differenc: I use an APPLY .nodes() to get all the <RATE>nodes and look for their last preceeding <FUEL_TYPE>*.

SELECT @xml.value(N'(/RECORD/JURISDICTION/@ID)[1]',N'nvarchar(max)') AS JURISDICTION_ID
      ,@xml.value(N'(/RECORD/JURISDICTION/text())[1]',N'nvarchar(max)') AS JURISDICTION
      ,@xml.value(N'(/RECORD/COUNTRY/text())[1]',N'nvarchar(max)') AS COUNTRY
      ,r.value(N'let $r:=. return ../FUEL_TYPE[.<<$r][last()]',N'varchar(max)') AS FUEL_TYPE
      ,r.value(N'(./@COUNTRY)[1]',N'nvarchar(max)') AS RATE_COUNTRY
      ,r.value(N'(./@RATECHANGE)[1]',N'int') AS RATE_RATECHANGE
      ,r.value(N'(./text())[1]',N'decimal(8,4)') AS RATE

FROM @xml.nodes(N'/RECORD/RATE') AS A(r)

The first lines of the result:

+-----------------+--------------+---------+----------------+--------------+-----------------+--------+
| JURISDICTION_ID | JURISDICTION | COUNTRY | FUEL_TYPE      | RATE_COUNTRY | RATE_RATECHANGE | RATE   |
+-----------------+--------------+---------+----------------+--------------+-----------------+--------+
| #15             | AB           | CAN     | Gasoline       | US           | 0               | 0.3734 |
+-----------------+--------------+---------+----------------+--------------+-----------------+--------+
| #15             | AB           | CAN     | Gasoline       | CAN          | 0               | 0.1300 |
+-----------------+--------------+---------+----------------+--------------+-----------------+--------+
| #15             | AB           | CAN     | Special Diesel | US           | 0               | 0.3734 |
+-----------------+--------------+---------+----------------+--------------+-----------------+--------+
| #15             | AB           | CAN     | Special Diesel | CAN          | 0               | 0.1300 |
+-----------------+--------------+---------+----------------+--------------+-----------------+--------+
| #15             | AB           | CAN     | Gasohol        | US           | 0               | 0.3734 |

UPDATE If hardcoded and in fixed order...

What about this brute force approach?

WITH Positions AS(SELECT * FROM (VALUES (3),(6),(9),(12),(15),(18),(21),(24),(27),(30),(33),(36)) AS v(Nr))
SELECT @xml.value(N'(/RECORD/*[sql:column("Positions.Nr")]/text())[1]',N'nvarchar(max)') AS FUEL_TYPE
      ,@xml.value(N'(/RECORD/*[sql:column("Positions.Nr")+1]/text())[1]',N'nvarchar(max)') AS RATE_US
      ,@xml.value(N'(/RECORD/*[sql:column("Positions.Nr")+2]/text())[1]',N'nvarchar(max)') AS RATE_CAN

FROM Positions

Upvotes: 3

Dan Field
Dan Field

Reputation: 21641

So as far as getting the XML going, which to my mind is the harder part, yes you can:

DECLARE @xml xml = '<RECORD>
        <JURISDICTION ID="#15"  >AB</JURISDICTION>
        <COUNTRY>CAN</COUNTRY>
        <FUEL_TYPE>Gasoline</FUEL_TYPE>
        <RATE COUNTRY="US" RATECHANGE="0">0.3734</RATE>
        <RATE COUNTRY="CAN" RATECHANGE="0">0.1300</RATE>
        <FUEL_TYPE>Special Diesel</FUEL_TYPE>
        <RATE COUNTRY="US" RATECHANGE="0">0.3734</RATE>
        <RATE COUNTRY="CAN" RATECHANGE="0">0.1300</RATE>
        <FUEL_TYPE>Gasohol</FUEL_TYPE>
        <RATE COUNTRY="US" RATECHANGE="0">0.3734</RATE>
        <RATE COUNTRY="CAN" RATECHANGE="0">0.1300</RATE>
        <FUEL_TYPE>Propane</FUEL_TYPE>
        <RATE COUNTRY="US" RATECHANGE="0">0.2700</RATE>
        <RATE COUNTRY="CAN" RATECHANGE="0">0.0940</RATE>
        <FUEL_TYPE>LNG</FUEL_TYPE>
        <RATE COUNTRY="US" RATECHANGE="0">0.0000</RATE>
        <RATE COUNTRY="CAN" RATECHANGE="0">0.0000</RATE>
        <FUEL_TYPE>CNG</FUEL_TYPE>
        <RATE COUNTRY="US" RATECHANGE="0">0.0000</RATE>
        <RATE COUNTRY="CAN" RATECHANGE="0">0.0000</RATE>
        <FUEL_TYPE>Ethanol</FUEL_TYPE>
        <RATE COUNTRY="US" RATECHANGE="0">0.3734</RATE>
        <RATE COUNTRY="CAN" RATECHANGE="0">0.1300</RATE>
        <FUEL_TYPE>Methanol</FUEL_TYPE>
        <RATE COUNTRY="US" RATECHANGE="0">0.3734</RATE>
        <RATE COUNTRY="CAN" RATECHANGE="0">0.1300</RATE>
        <FUEL_TYPE>E-85</FUEL_TYPE>
        <RATE COUNTRY="US" RATECHANGE="0">0.3734</RATE>
        <RATE COUNTRY="CAN" RATECHANGE="0">0.1300</RATE>
        <FUEL_TYPE>M-85</FUEL_TYPE>
        <RATE COUNTRY="US" RATECHANGE="0">0.3734</RATE>
        <RATE COUNTRY="CAN" RATECHANGE="0">0.1300</RATE>
        <FUEL_TYPE>A55</FUEL_TYPE>
        <RATE COUNTRY="US" RATECHANGE="0">0.3734</RATE>
        <RATE COUNTRY="CAN" RATECHANGE="0">0.1300</RATE>
        <FUEL_TYPE>Biodiesel</FUEL_TYPE>
        <RATE COUNTRY="US" RATECHANGE="0">0.3734</RATE>
        <RATE COUNTRY="CAN" RATECHANGE="0">0.1300</RATE>
    </RECORD>'


;WITH xCTE AS
(
    SELECT @xml.query('
    <Data jurisdiction="{data(/RECORD/JURISDICTION/text())}" country="{data(/RECORD/COUNTRY/text())}">
        {
            for $x in (/RECORD/*[local-name() = "RATE"]) 
            return
                <FuelType name="{data(/RECORD/FUEL_TYPE[. << $x][last()])}" country="{data($x/@COUNTRY)}" value="{data($x)}" />
        }
    </Data>
        ') AS DocXml
)
SELECT
    x.n.value('../@jurisdiction', 'VARCHAR(10)') as Jurisdiction
    ,x.n.value('../@country', 'VARCHAR(5)') as Country
    ,x.n.value('@name', 'VARCHAR(20)') as Name
    ,x.n.value('@country', 'VARCHAR(5)') as RateCountry
    ,x.n.value('@value', 'DECIMAL(10,4)') as Rate
FROM xCTE
CROSS APPLY DocXml.nodes('/Data/FuelType') x(n)

Basic idea: use XQuery in a CTE to transform the original document into something more manageable - getting the last FUEL_TYPE node that came before the current RATE node, and stuffing everything into a single XML element (or group of elements - attributes are a bit more compact and easier to work with here IMO).

This doesn't quite give you the output you want though - it gives an output like this:

Jurisdiction Country Name                 RateCountry Rate
------------ ------- -------------------- ----------- ---------------------------------------
AB           CAN     Gasoline             US          0.3734
AB           CAN     Gasoline             CAN         0.1300
AB           CAN     Special Diesel       US          0.3734
AB           CAN     Special Diesel       CAN         0.1300
...

If you want to go from there you should be able to do some kind of CROSS APPLY or PIVOT, but before I'd go down that road I'd consider if that's really what you want or if this will actually work better.

Upvotes: 3

Related Questions