Reputation: 17157
I'm trying to use Excel's get&transform functionality (previously known as powerquery) to import an XML data source. The data source has a list of b
tags, each with a variable number of d
tags in a c2
child, such as the following:
<a>
<b>
<c1>foo</c1>
<c2>
<d>bar</d>
</c2>
</b>
<b>
<c1>fuz</c1>
<c2>
<d>baz</d>
<d>quz</d>
</c2>
</b>
</a>
When I import this data with the following query the data type for column c2.d is different for the two different rows representing the b
items, for the first row it is a general spreadsheet cell type, for the second row it is a Table type.
let
Source = Xml.Tables(File.Contents("C:\Localdata\excel-powerquery-test2.xml")),
Table0 = Source{0}[Table],
#"Changed Type" = Table.TransformColumnTypes(Table0,{{"c1", type text}}),
#"Expanded c2" = Table.ExpandTableColumn(#"Changed Type", "c2", {"d"}, {"c2.d"})
in
#"Expanded c2"
It seems that for the first row it automatically converts the d
tag into a simple spreadsheet cell as there is only one and it only contains text. However for the second row it sees there are two d
tags and hence keeps it as a table. The problem now is that I can neither load the data as is as the Table in the second row is loaded into the spreadsheet as the literal string "Table"
leaving me without the actual data, nor can I further expand the Table using Table.ExpandTableColumn
as it (rightly) complains that bar
in the first row is not a table.
I presume the automatic conversion of a single tag containing text to a simple cell rather than a table happens either in the Xml.Tables
or ExpandTableColumn
functions. The tooltip for Xml.Tables
shows that it has an options
parameter, unfortunately the documentation for Xml.Tables
does not give any details on this options
parameter.
How can I get this second row expanded out to two rows, one each for the two d
tags contained in the second b
tag having the same "fuz"
string in the first column? Such an expansion works fine if the contents of the d
tags are further XML tags, but apparently not if the d
tags only contain text.
Upvotes: 1
Views: 200
Reputation: 40224
Let's add a step to make sure everything is at the same level:
let
Source = Xml.Tables(File.Contents("C:\Localdata\excel-powerquery-test2.xml")),
Table0 = Source{0}[Table],
Expandc2 = Table.ExpandTableColumn(Table0, "c2", {"d"}, {"d"}),
ToLists = Table.TransformColumns(Expandc2,
{"d", each if _ is table then Table.ToList(_) else {_}}),
ExpandLists = Table.ExpandListColumn(ToLists, "d")
in
ExpandLists
The ToLists
step turns this:
Into a more consistent list format:
c1 d
-----------------------
foo {"bar"}
fuz {"baz", "quz"}
Then you can expand to rows without mixed data types.
Upvotes: 1