Reputation: 17915
I need to parse following XML: https://www.iftach.org/taxmatrix/charts/4Q2016.xml
Each of those RECORD nodes have to be in separate row:
<RECORD>
<JURISDICTION ID="#15" >AB</JURISDICTION>
<COUNTRY>CAN</COUNTRY>
<FUEL_TYPE>Gasoline</FUEL_TYPE>
<RATE COUNTRY="US" RATECHANGE="0">0.3734</RATE>
<RATE COUNTRY="CAN" RATECHANGE="0">0.1300</RATE>
<FUEL_TYPE>Special Diesel</FUEL_TYPE>
<RATE COUNTRY="US" RATECHANGE="0">0.3734</RATE>
<RATE COUNTRY="CAN" RATECHANGE="0">0.1300</RATE>
<FUEL_TYPE>Gasohol</FUEL_TYPE>
<RATE COUNTRY="US" RATECHANGE="0">0.3734</RATE>
<RATE COUNTRY="CAN" RATECHANGE="0">0.1300</RATE>
<FUEL_TYPE>Propane</FUEL_TYPE>
<RATE COUNTRY="US" RATECHANGE="0">0.2700</RATE>
<RATE COUNTRY="CAN" RATECHANGE="0">0.0940</RATE>
<FUEL_TYPE>LNG</FUEL_TYPE>
<RATE COUNTRY="US" RATECHANGE="0">0.0000</RATE>
<RATE COUNTRY="CAN" RATECHANGE="0">0.0000</RATE>
<FUEL_TYPE>CNG</FUEL_TYPE>
<RATE COUNTRY="US" RATECHANGE="0">0.0000</RATE>
<RATE COUNTRY="CAN" RATECHANGE="0">0.0000</RATE>
<FUEL_TYPE>Ethanol</FUEL_TYPE>
<RATE COUNTRY="US" RATECHANGE="0">0.3734</RATE>
<RATE COUNTRY="CAN" RATECHANGE="0">0.1300</RATE>
<FUEL_TYPE>Methanol</FUEL_TYPE>
<RATE COUNTRY="US" RATECHANGE="0">0.3734</RATE>
<RATE COUNTRY="CAN" RATECHANGE="0">0.1300</RATE>
<FUEL_TYPE>E-85</FUEL_TYPE>
<RATE COUNTRY="US" RATECHANGE="0">0.3734</RATE>
<RATE COUNTRY="CAN" RATECHANGE="0">0.1300</RATE>
<FUEL_TYPE>M-85</FUEL_TYPE>
<RATE COUNTRY="US" RATECHANGE="0">0.3734</RATE>
<RATE COUNTRY="CAN" RATECHANGE="0">0.1300</RATE>
<FUEL_TYPE>A55</FUEL_TYPE>
<RATE COUNTRY="US" RATECHANGE="0">0.3734</RATE>
<RATE COUNTRY="CAN" RATECHANGE="0">0.1300</RATE>
<FUEL_TYPE>Biodiesel</FUEL_TYPE>
<RATE COUNTRY="US" RATECHANGE="0">0.3734</RATE>
<RATE COUNTRY="CAN" RATECHANGE="0">0.1300</RATE>
</RECORD>
I want following columns in my set:
Is this even possible in T_SQL? In C# I would just read XML top to bottom and connect TYPE to RATE based on position in XML. It's more convenient for me to have this in SQL though..
Upvotes: 2
Views: 43
Reputation: 67311
Here is another solution, which might be a bit easier. The idea is roughly the same as in Dan's approach (which is great!).
The differenc: I use an APPLY .nodes()
to get all the <RATE>
nodes and look for their last preceeding <FUEL_TYPE>
*.
SELECT @xml.value(N'(/RECORD/JURISDICTION/@ID)[1]',N'nvarchar(max)') AS JURISDICTION_ID
,@xml.value(N'(/RECORD/JURISDICTION/text())[1]',N'nvarchar(max)') AS JURISDICTION
,@xml.value(N'(/RECORD/COUNTRY/text())[1]',N'nvarchar(max)') AS COUNTRY
,r.value(N'let $r:=. return ../FUEL_TYPE[.<<$r][last()]',N'varchar(max)') AS FUEL_TYPE
,r.value(N'(./@COUNTRY)[1]',N'nvarchar(max)') AS RATE_COUNTRY
,r.value(N'(./@RATECHANGE)[1]',N'int') AS RATE_RATECHANGE
,r.value(N'(./text())[1]',N'decimal(8,4)') AS RATE
FROM @xml.nodes(N'/RECORD/RATE') AS A(r)
The first lines of the result:
+-----------------+--------------+---------+----------------+--------------+-----------------+--------+
| JURISDICTION_ID | JURISDICTION | COUNTRY | FUEL_TYPE | RATE_COUNTRY | RATE_RATECHANGE | RATE |
+-----------------+--------------+---------+----------------+--------------+-----------------+--------+
| #15 | AB | CAN | Gasoline | US | 0 | 0.3734 |
+-----------------+--------------+---------+----------------+--------------+-----------------+--------+
| #15 | AB | CAN | Gasoline | CAN | 0 | 0.1300 |
+-----------------+--------------+---------+----------------+--------------+-----------------+--------+
| #15 | AB | CAN | Special Diesel | US | 0 | 0.3734 |
+-----------------+--------------+---------+----------------+--------------+-----------------+--------+
| #15 | AB | CAN | Special Diesel | CAN | 0 | 0.1300 |
+-----------------+--------------+---------+----------------+--------------+-----------------+--------+
| #15 | AB | CAN | Gasohol | US | 0 | 0.3734 |
What about this brute force approach?
WITH Positions AS(SELECT * FROM (VALUES (3),(6),(9),(12),(15),(18),(21),(24),(27),(30),(33),(36)) AS v(Nr))
SELECT @xml.value(N'(/RECORD/*[sql:column("Positions.Nr")]/text())[1]',N'nvarchar(max)') AS FUEL_TYPE
,@xml.value(N'(/RECORD/*[sql:column("Positions.Nr")+1]/text())[1]',N'nvarchar(max)') AS RATE_US
,@xml.value(N'(/RECORD/*[sql:column("Positions.Nr")+2]/text())[1]',N'nvarchar(max)') AS RATE_CAN
FROM Positions
Upvotes: 3
Reputation: 21641
So as far as getting the XML going, which to my mind is the harder part, yes you can:
DECLARE @xml xml = '<RECORD>
<JURISDICTION ID="#15" >AB</JURISDICTION>
<COUNTRY>CAN</COUNTRY>
<FUEL_TYPE>Gasoline</FUEL_TYPE>
<RATE COUNTRY="US" RATECHANGE="0">0.3734</RATE>
<RATE COUNTRY="CAN" RATECHANGE="0">0.1300</RATE>
<FUEL_TYPE>Special Diesel</FUEL_TYPE>
<RATE COUNTRY="US" RATECHANGE="0">0.3734</RATE>
<RATE COUNTRY="CAN" RATECHANGE="0">0.1300</RATE>
<FUEL_TYPE>Gasohol</FUEL_TYPE>
<RATE COUNTRY="US" RATECHANGE="0">0.3734</RATE>
<RATE COUNTRY="CAN" RATECHANGE="0">0.1300</RATE>
<FUEL_TYPE>Propane</FUEL_TYPE>
<RATE COUNTRY="US" RATECHANGE="0">0.2700</RATE>
<RATE COUNTRY="CAN" RATECHANGE="0">0.0940</RATE>
<FUEL_TYPE>LNG</FUEL_TYPE>
<RATE COUNTRY="US" RATECHANGE="0">0.0000</RATE>
<RATE COUNTRY="CAN" RATECHANGE="0">0.0000</RATE>
<FUEL_TYPE>CNG</FUEL_TYPE>
<RATE COUNTRY="US" RATECHANGE="0">0.0000</RATE>
<RATE COUNTRY="CAN" RATECHANGE="0">0.0000</RATE>
<FUEL_TYPE>Ethanol</FUEL_TYPE>
<RATE COUNTRY="US" RATECHANGE="0">0.3734</RATE>
<RATE COUNTRY="CAN" RATECHANGE="0">0.1300</RATE>
<FUEL_TYPE>Methanol</FUEL_TYPE>
<RATE COUNTRY="US" RATECHANGE="0">0.3734</RATE>
<RATE COUNTRY="CAN" RATECHANGE="0">0.1300</RATE>
<FUEL_TYPE>E-85</FUEL_TYPE>
<RATE COUNTRY="US" RATECHANGE="0">0.3734</RATE>
<RATE COUNTRY="CAN" RATECHANGE="0">0.1300</RATE>
<FUEL_TYPE>M-85</FUEL_TYPE>
<RATE COUNTRY="US" RATECHANGE="0">0.3734</RATE>
<RATE COUNTRY="CAN" RATECHANGE="0">0.1300</RATE>
<FUEL_TYPE>A55</FUEL_TYPE>
<RATE COUNTRY="US" RATECHANGE="0">0.3734</RATE>
<RATE COUNTRY="CAN" RATECHANGE="0">0.1300</RATE>
<FUEL_TYPE>Biodiesel</FUEL_TYPE>
<RATE COUNTRY="US" RATECHANGE="0">0.3734</RATE>
<RATE COUNTRY="CAN" RATECHANGE="0">0.1300</RATE>
</RECORD>'
;WITH xCTE AS
(
SELECT @xml.query('
<Data jurisdiction="{data(/RECORD/JURISDICTION/text())}" country="{data(/RECORD/COUNTRY/text())}">
{
for $x in (/RECORD/*[local-name() = "RATE"])
return
<FuelType name="{data(/RECORD/FUEL_TYPE[. << $x][last()])}" country="{data($x/@COUNTRY)}" value="{data($x)}" />
}
</Data>
') AS DocXml
)
SELECT
x.n.value('../@jurisdiction', 'VARCHAR(10)') as Jurisdiction
,x.n.value('../@country', 'VARCHAR(5)') as Country
,x.n.value('@name', 'VARCHAR(20)') as Name
,x.n.value('@country', 'VARCHAR(5)') as RateCountry
,x.n.value('@value', 'DECIMAL(10,4)') as Rate
FROM xCTE
CROSS APPLY DocXml.nodes('/Data/FuelType') x(n)
Basic idea: use XQuery in a CTE to transform the original document into something more manageable - getting the last FUEL_TYPE
node that came before the current RATE
node, and stuffing everything into a single XML element (or group of elements - attributes are a bit more compact and easier to work with here IMO).
This doesn't quite give you the output you want though - it gives an output like this:
Jurisdiction Country Name RateCountry Rate
------------ ------- -------------------- ----------- ---------------------------------------
AB CAN Gasoline US 0.3734
AB CAN Gasoline CAN 0.1300
AB CAN Special Diesel US 0.3734
AB CAN Special Diesel CAN 0.1300
...
If you want to go from there you should be able to do some kind of CROSS APPLY
or PIVOT
, but before I'd go down that road I'd consider if that's really what you want or if this will actually work better.
Upvotes: 3