Reputation: 48357
I am trying to convert an XML file to CSV. While the question here was useful, I was unable to apply the suggestions to my case - presumably because my entities are multi-valued. My Xml looks something like:
<?xml version="1.0" encoding="UTF-8">
<ReportOutput Version="1">
<ReportFilters>
<Filter Name="Report Name" Value="My report"/>
<Filter Name="Path" Value="/">
<Filter Name="attr1" Value="*">
...
</ReportFilters>
<ReportHeader>
<columnHeader>attr1</columnheader>
<columnHeader>attr2</columnheader>
<columnHeader>attr3</columnheader>
...
</ReportHeader>
<ReportRecord>
<item>1</item>
<item>first</item>
<item>A</item>
...
</ReportRecord>
<ReportRecord>
<item>2</item
<item>second</item>
<item>B</item>
...
</ReportRecord>
...
</ReportOutput>
(where '...' represents one or recurrences of the previous node pattern)
The XML is merely a thin wrapper aroun d something which is intrinsically a tabular dataset - the ReportHeader and ReportRecord nodes all contain the same number of childnodes.
I want the ReportHeader.columnHeaders and ReportRecord.Items in my CSV file:
attr1, attr2, attr3 ...
1, first, A ...
2, second, B ...
I can remove the ReportFilters
easily enough:
[xml]$xml = Get-Content data.xml
$filter=$xml.ReportOutput.ReportFilters
$filter.ParentNode.RemoveChild($filter)
but iterating through the data is bit more tricky.
$xml.ReportOutput.ChildNodes | Export-Csv "C:\Temp\report.csv" -NoTypeInformation -Delimiter:"," -Encoding:UTF8
The first record in the CSV file is the single attribute '"columnHeader"', second record is '"System.Object[]"' thereafter, lots of blank lines.
{
$xml.ReportOutput.ReportHeader | ConvertTo-Csv -NoTypeInformation -Delimiter:","
foreach ($r in $xml.ReportRecord) {
$r | ConvertTo-Csv -NoTypeInformation -Delimiter:","
}
} | Set-Content -Path "C:\Temp\report.csv" -Encoding:UTF8
wrote part of the source code to the output stream.
$xml.ReportOutput.ReportHeader | ConvertTo-Csv -NoTypeInformation -Delimiter:"," | Set-Content -Path "C:\Temp\report.csv" -Encoding:UTF8
foreach ($r in $xml.ReportOutput.ReportRecord) {
$r | ConvertTo-Csv -NoTypeInformation -Delimiter:"," | Add-Content -Path "C:\Temp\report.csv" -Encoding:UTF8
}
just wrote lots of gibberish.
$xml.ReportOutput.ReportHeader.ChildNodes | ConvertTo-Csv -NoTypeInformation -Delimiter:"," | Set-Content -Path "C:\Temp\report.csv" -Encoding:UTF8
foreach ($r in $xml.ReportOutput.ReportRecord) {
$r.ChildNodes | ConvertTo-Csv -NoTypeInformation -Delimiter:"," | Add-Content -Path "C:\Temp\report.csv" -Encoding:UTF8
}
had the data - but as one attribute per record
Upvotes: 0
Views: 881
Reputation: 48357
Greg's (excellent) solution uses string splicing to create a prototype CSV file (in $csv) then converts it back to a powershell collection and then to CSV as a cleaning stage. The string splicing method is not very sophisticated and things get messy if the input data contains things seen as CSV meta-data (i.e. commas). Hence I found it safer to use a tab char:
$csv = @(($xml = [xml](Get-Content C:\path\input.xml)).SelectNodes('//columnHeader').'#text' -join "`t")
$csv += $xml.SelectNodes('//ReportRecord').ForEach{$_.item -join "`t"}
$csv | ConvertFrom-Csv -Delimiter "`t" | Export-Csv C:\path\output.csv
This is not completely robust, but good enough for me.
Upvotes: 0
Reputation:
I sincerely believe that your XML document is valid and all of the closing tags in it are correct (as it a hint). So...
# create header of the future CSV (attr1,attr2,attr3...)
$csv = @(($xml = [xml](Get-Content C:\path\input.xml)).SelectNodes('//columnHeader').'#text' -join ',')
# append lines to the future CSV
$csv += $xml.SelectNodes('//ReportRecord').ForEach{$_.item -join ','}
# at present moment $csv is the array
# attr1,attr2,attr3...
# 1,first,A...
# 2,second,B...
# time to write this data as CSV on disk
$csv | ConvertFrom-Csv | Export-Csv C:\path\output.csv
Hope this helps.
Upvotes: 1