user5395930
user5395930

Reputation:

Converting Complex XML to CSV

I have some (complex to me) XML code that I need to convert into CSV, I need absolutely every value added to the CSV for every submission, I have tried a few basic things however I cant get past the deep nesting and the different structures of this file.

Could someone please help me with a powershell script that would, I have started but cannot get the output of all data out I only get Canvas Results

Submissions.xml To large to post here (102KB)

$d=([xml](gc submissions.xml)).CANVASRESULTS | % { 
  foreach ($i in $_.CANVASRESULTS) {
    $o = New-Object Object
    Add-Member -InputObject $o -MemberType NoteProperty -Name Submissions -Value $_.Submission
    Add-Member -InputObject $o -MemberType NoteProperty -Name Submission -Value $i
    $o
  }
}
$d | ConvertTo-Csv -NoTypeInformation -Delimiter ","

Upvotes: 0

Views: 2468

Answers (1)

Parfait
Parfait

Reputation: 107767

Anytime a complex XML has deeply nested structures and you require migration into a flat file format (i.e., txt, csv, xlsx, sql), consider using XSLT to simplify your XML format. As information, XSLT is a declarative, special-purpose programming language used to style, re-format, re-structure XML/HTML and other SGML markup documents for various end-use purposes. Aside - SQL is also a declarative, special-purpose programming language.

For most softwares to import XML into flat file formats in two dimensions of rows and columns, XML files must follow repeating elements (i.e., rows/records) with one level of children for columns/fields:

<data>
  <row>
     <column1>value</column1>
     <column1>value</column1>
     <column1>value</column1>
     ...
  </row>
  <row>
  ...
</data>

Nearly every programming language maintains an XSLT processor including PowerShell, Java, C#, Perl, PHP, Python, SAS, even VBA with your everyday MS Excel. For your complex XML, below is an example XSLT stylesheet with following output. Do note I manually create nodes based on values from original XML:

<?xml version="1.0" encoding="utf-8"?>
<xsl:stylesheet version="1.0" xmlns:xsl="http://www.w3.org/1999/XSL/Transform">

 <xsl:strip-space elements="*"/>
 <xsl:output method="xml" indent="yes"/>   

 <xsl:template match="CanvasResult">    
   <Data>    
    <xsl:for-each select="//Responses">
     <Submission>  
       <Fitter><xsl:value-of select="Response[contains(Label, 'Fitter Name')]/Value"/></Fitter>
       <Date><xsl:value-of select="Response[Label='Date']/Value"/></Date>
       <Time><xsl:value-of select="Response[Label='Time']/Value"/></Time>
       <Client><xsl:value-of select="Response[Label='Client']/Value"/></Client>
       <Machine><xsl:value-of select="Response[Label='Machine']/Value"/></Machine>
       <Hours><xsl:value-of select="Response[Label='Hours']/Value"/></Hours>
       <Signature><xsl:value-of select="Response[Label='Signature']/Value"/></Signature>
       <SubmissionDate><xsl:value-of select="Response[Label='Submission Date:']/Value"/></SubmissionDate>
       <SubmissionTime><xsl:value-of select="Response[Label='Submission Time:']/Value"/></SubmissionTime>
       <Customer><xsl:value-of select="Response[Label='Customer:']/Value"/></Customer>
       <PlantLocation><xsl:value-of select="Response[Label='Plant Location']/Value"/></PlantLocation>
       <PlantType><xsl:value-of select="Response[Label='Plant Type:']/Value"/></PlantType>
       <PlantID><xsl:value-of select="Response[Label='Plant ID:']/Value"/></PlantID>
       <PlantHours><xsl:value-of select="Response[Label='Plant Hours:']/Value"/></PlantHours>
       <RegoExpiryDate><xsl:value-of select="Response[Label='Rego Expiry Date:']/Value"/></RegoExpiryDate>
       <Comments><xsl:value-of select="Response[Label='Comments:']/Value"/></Comments>        
     </Submission>
    </xsl:for-each>      
  </Data>   
 </xsl:template>

</xsl:stylesheet>

Output

<?xml version='1.0' encoding='UTF-8'?>
<Data>
 ...
 <Submission>
    <Fitter>Damian Stewart</Fitter>
    <Date/>
    <Time/>
    <Client/>
    <Machine/>
    <Hours/>
    <Signature/>
    <SubmissionDate>28/09/2015</SubmissionDate>
    <SubmissionTime>16:30</SubmissionTime>
    <Customer>Dicks Diesels</Customer>
    <PlantLocation/>
    <PlantType>Dozer</PlantType>
    <PlantID>DZ09</PlantID>
    <PlantHours>2213.6</PlantHours>
    <RegoExpiryDate>05/03/2016</RegoExpiryDate>
    <Comments>Moving tomorrow from Daracon BOP to KCE BOP S6A Dam&#13;
Cabbie to operate</Comments>
  </Submission>
  ...
</Data>

From there, you can import the two-dimensional XML into a usable rows/columns format. Below are the same import into an MS Access Database and MS Excel spreadsheet. You will notice gaps in the data due to XML content not populating the created nodes (handled in XSLT). A simple SQL cleanup can render final dataset.

XML Import into Access Database

Database Import

XML Import into Excel spreadsheet

Upvotes: 2

Related Questions