Nicholas Saunders
Nicholas Saunders

Reputation: 764

How to read XML and write CSV with Powershell?

How is books.xml converted to CSV and written to a file?

PS /home/nicholas/xml> 
PS /home/nicholas/xml> $books = Import-Clixml books.xml 
PS /home/nicholas/xml> 
PS /home/nicholas/xml> $books.InnerXml                  
<?xml version="1.0"?><catalog><book id="bk101"><author>Gambardella, Matthew</author><title>XML Developer's Guide</title><genre>Computer</genre><price>44.95</price><publish_date>2000-10-01</publish_date><description>An in-depth look at creating applications 
      with XML.</description></book><book id="bk102"><author>Ralls, Kim</author><title>Midnight Rain</title><genre>Fantasy</genre><price>5.95</price><publish_date>2000-12-16</publish_date><description>A former architect battles corporate zombies, 
      an evil sorceress, and her own childhood to become queen 
      of the world.</description></book><book id="bk103"><author>Corets, Eva</author><title>Maeve Ascendant</title><genre>Fantasy</genre><price>5.95</price><publish_date>2000-11-17</publish_date><description>After the collapse of a nanotechnology 
      society in England, the young survivors lay the 
      foundation for a new society.</description></book><book id="bk104"><author>Corets, Eva</author><title>Oberon's Legacy</title><genre>Fantasy</genre><price>5.95</price><publish_date>2001-03-10</publish_date><description>In post-apocalypse England, the mysterious 
      agent known only as Oberon helps to create a new life 
      for the inhabitants of London. Sequel to Maeve 
      Ascendant.</description></book><book id="bk105"><author>Corets, Eva</author><title>The Sundered Grail</title><genre>Fantasy</genre><price>5.95</price><publish_date>2001-09-10</publish_date><description>The two daughters of Maeve, half-sisters, 
      battle one another for control of England. Sequel to 
      Oberon's Legacy.</description></book><book id="bk106"><author>Randall, Cynthia</author><title>Lover Birds</title><genre>Romance</genre><price>4.95</price><publish_date>2000-09-02</publish_date><description>When Carla meets Paul at an ornithology 
      conference, tempers fly as feathers get ruffled.</description></book><book id="bk107"><author>Thurman, Paula</author><title>Splish Splash</title><genre>Romance</genre><price>4.95</price><publish_date>2000-11-02</publish_date><description>A deep sea diver finds true love twenty 
      thousand leagues beneath the sea.</description></book><book id="bk108"><author>Knorr, Stefan</author><title>Creepy Crawlies</title><genre>Horror</genre><price>4.95</price><publish_date>2000-12-06</publish_date><description>An anthology of horror stories about roaches,
      centipedes, scorpions  and other insects.</description></book><book id="bk109"><author>Kress, Peter</author><title>Paradox Lost</title><genre>Science Fiction</genre><price>6.95</price><publish_date>2000-11-02</publish_date><description>After an inadvertant trip through a Heisenberg
      Uncertainty Device, James Salway discovers the problems 
      of being quantum.</description></book><book id="bk110"><author>O'Brien, Tim</author><title>Microsoft .NET: The Programming Bible</title><genre>Computer</genre><price>36.95</price><publish_date>2000-12-09</publish_date><description>Microsoft's .NET initiative is explored in 
      detail in this deep programmer's reference.</description></book><book id="bk111"><author>O'Brien, Tim</author><title>MSXML3: A Comprehensive Guide</title><genre>Computer</genre><price>36.95</price><publish_date>2000-12-01</publish_date><description>The Microsoft MSXML3 parser is covered in 
      detail, with attention to XML DOM interfaces, XSLT processing, 
      SAX and more.</description></book><book id="bk112"><author>Galos, Mike</author><title>Visual Studio 7: A Comprehensive Guide</title><genre>Computer</genre><price>49.95</price><publish_date>2001-04-16</publish_date><description>Microsoft Visual Studio 7 is explored in depth,
      looking at how Visual Basic, Visual C++, C#, and ASP+ are 
      integrated into a comprehensive development 
      environment.</description></book></catalog>
PS /home/nicholas/xml> 
PS /home/nicholas/xml> $books | Export-Csv books.csv      
PS /home/nicholas/xml> 
PS /home/nicholas/xml> cat ./books.csv
"xml","catalog"
"version=""1.0""","System.Xml.XmlElement"
PS /home/nicholas/xml> 

It would seem that $csv is converted, but how is it written as a CSV file?

Converting to JSON didn't go much better:

PS /home/nicholas/xml> 
PS /home/nicholas/xml> 
PS /home/nicholas/xml> $books | ConvertTo-Json          
WARNING: Resulting JSON is truncated as serialization has exceeded the set depth of 2.
[
  [],
  [
    [
      "System.Xml.XmlElement",
      "System.Xml.XmlElement",
      "System.Xml.XmlElement",

..

even with a depth of 9 or greater.

Here's the output of $books:

PS /home/nicholas/xml> 
PS /home/nicholas/xml> $books

xml           catalog
---           -------
version="1.0" catalog

PS /home/nicholas/xml> $books.ChildNodes

Version         : 1.0
Encoding        : 
Standalone      : 
Value           : version="1.0"
InnerText       : version="1.0"
Name            : xml
LocalName       : xml
NodeType        : XmlDeclaration
PreviousSibling : 
NextSibling     : catalog
ParentNode      : #document
ChildNodes      : {}
Attributes      : 
OwnerDocument   : #document
FirstChild      : 
LastChild       : 
HasChildNodes   : False
NamespaceURI    : 
Prefix          : 
IsReadOnly      : False
OuterXml        : <?xml version="1.0"?>
InnerXml        : 
SchemaInfo      : System.Xml.Schema.XmlSchemaInfo
BaseURI         : 
PreviousText    : 

book : {book, book, book, book…}

PS /home/nicholas/xml>

which is far from CSV.

Upvotes: 0

Views: 170

Answers (2)

jdweng
jdweng

Reputation: 34421

Try following :

using assembly System
using assembly System.Collections
using assembly System.Xml.Linq
using assembly System.IO

$inputFilename = "c:\temp\test.xml"
$outputFilename = "c:\temp\test.csv"


$doc = [System.Xml.Linq.XDocument]::Load($inputFilename)
$writer = New-Object System.IO.StreamWriter($outputFilename)
$header = "id,author,title,genre,price,publish,description"
$writer.Writeline($header)

$books = $doc.Descendants("book")

foreach($book in $books)
{
 
   $id =  $book.Attribute("id").Value
   $author = $book.Element("author").Value
   $title = $book.Element("title").Value
   $genre = $book.Element("genre").Value
   $price = $book.Element("price").Value
   $date = $book.Element("publish_date").Value
   $description = $book.Element("description").Value
   $lineArray = @($id,$author,$title,$genre,$price,$date,$description)
   Write-Host $lineArray
   $line = [String]::Join(",", $lineArray)
   Write-Host $line
   $writer.Writeline($line)
}
$writer.Flush()
$writer.Close()

I update code below

  1. Added double quotes around the field because data contained commas.

  2. Merged description field to remove carriage return and extra spaces using Regex.

    using assembly System using assembly System.Collections using assembly System.Xml.Linq using assembly System.IO using assembly System.Text.RegularExpressions

    $inputFilename = "c:\temp\test.xml" $outputFilename = "c:\temp\test.csv"

    $doc = [System.Xml.Linq.XDocument]::Load($inputFilename) $writer = New-Object System.IO.StreamWriter($outputFilename) $headerArray = @("id","author","title","genre","price","publish","description") $header = [String]::Join(""",""", $headerArray) $writer.Writeline("""" + $header + """")

    $books = $doc.Descendants("book")

    foreach($book in $books) {

    $id =  $book.Attribute("id").Value
    $author = $book.Element("author").Value
    $title = $book.Element("title").Value
    $genre = $book.Element("genre").Value
    $price = $book.Element("price").Value
    $date = $book.Element("publish_date").Value
    $description = $book.Element("description").Value
    $description = [System.Text.RegularExpressions.Regex]::Replace($description, "\s+", " ");
    $lineArray = @($id,$author,$title,$genre,$price,$date,$description)
    #Write-Host $lineArray
    $line = """" + [String]::Join(""",""", $lineArray) + """" 
    #Write-Host $line
    $writer.Writeline($line)
    

    } $writer.Flush() $writer.Close()

Upvotes: 1

mklement0
mklement0

Reputation: 437813

Import-Clixml is not meant for importing arbitrary XML documents; it is solely designed to work with XML documents that contain CLIXML data, which is an XML-based serialization format for representing instances of .NET types for cross-process communication, such as produced by Export-Clixml. CLIXML is typically used behind the scenes in PowerShell remoting (and, more generally, cross-process communication).

Perhaps surprisingly, there is no general-purpose cmdlet for importing arbitrary XML documents (however, there is a cmdlet for querying and extracting data from XML documents, namely Select-Xml).

You have to work with the [xml] type (System.Xml.XmlDocument) to import XML data into a DOM, which then allows you to transform the data to a different format:

# Load and parse the XML file into a DOM.
($xmlDoc = [xml]::new()).Load((Convert-Path books.xml))

# Export the book child elements of the catalog element to a CSV file.
$xmlDoc.catalog.book | Export-Csv books.csv

# Parse the CSV file into objects ([pscustomobject] instances)
# and output them to the console with friendly formatting.
Import-Csv books.csv

Note: In Windows PowerShell, consider using -NoTypeInformation with Export-Csv, as well as the -Encoding parameter to control the output character encoding, which defaults to ASCII(!). In PowerShell (Core) 7+, fortunately, -NoTypeInformation is now implied and the default character encoding is (BOM-less) UTF-8.

Output:

id           : bk101
author       : Gambardella, Matthew
title        : XML Developer's Guide
genre        : Computer
price        : 44.95
publish_date : 2000-10-01
description  : An in-depth look at creating applications 
                     with XML.

id           : bk102
author       : Ralls, Kim
title        : Midnight Rain
genre        : Fantasy
price        : 5.95
publish_date : 2000-12-16
description  : A former architect battles corporate zombies, 
                     an evil sorceress, and her own childhood to become queen 
                     of the world.

id           : bk103
author       : Corets, Eva
title        : Maeve Ascendant
genre        : Fantasy
price        : 5.95
publish_date : 2000-11-17
description  : After the collapse of a nanotechnology 
                     society in England, the young survivors lay the 
                     foundation for a new society.

id           : bk104
author       : Corets, Eva
title        : Oberon's Legacy
genre        : Fantasy
price        : 5.95
publish_date : 2001-03-10
description  : In post-apocalypse England, the mysterious 
                     agent known only as Oberon helps to create a new life 
                     for the inhabitants of London. Sequel to Maeve 
                     Ascendant.

id           : bk105
author       : Corets, Eva
title        : The Sundered Grail
genre        : Fantasy
price        : 5.95
publish_date : 2001-09-10
description  : The two daughters of Maeve, half-sisters, 
                     battle one another for control of England. Sequel to 
                     Oberon's Legacy.

id           : bk106
author       : Randall, Cynthia
title        : Lover Birds
genre        : Romance
price        : 4.95
publish_date : 2000-09-02
description  : When Carla meets Paul at an ornithology 
                     conference, tempers fly as feathers get ruffled.

id           : bk107
author       : Thurman, Paula
title        : Splish Splash
genre        : Romance
price        : 4.95
publish_date : 2000-11-02
description  : A deep sea diver finds true love twenty 
                     thousand leagues beneath the sea.

id           : bk108
author       : Knorr, Stefan
title        : Creepy Crawlies
genre        : Horror
price        : 4.95
publish_date : 2000-12-06
description  : An anthology of horror stories about roaches,
                     centipedes, scorpions  and other insects.

id           : bk109
author       : Kress, Peter
title        : Paradox Lost
genre        : Science Fiction
price        : 6.95
publish_date : 2000-11-02
description  : After an inadvertant trip through a Heisenberg
                     Uncertainty Device, James Salway discovers the problems 
                     of being quantum.

id           : bk110
author       : O'Brien, Tim
title        : Microsoft .NET: The Programming Bible
genre        : Computer
price        : 36.95
publish_date : 2000-12-09
description  : Microsoft's .NET initiative is explored in 
                     detail in this deep programmer's reference.

id           : bk111
author       : O'Brien, Tim
title        : MSXML3: A Comprehensive Guide
genre        : Computer
price        : 36.95
publish_date : 2000-12-01
description  : The Microsoft MSXML3 parser is covered in 
                     detail, with attention to XML DOM interfaces, XSLT processing, 
                     SAX and more.

id           : bk112
author       : Galos, Mike
title        : Visual Studio 7: A Comprehensive Guide
genre        : Computer
price        : 49.95
publish_date : 2001-04-16
description  : Microsoft Visual Studio 7 is explored in depth,
                     looking at how Visual Basic, Visual C++, C#, and ASP+ are 
                     integrated into a comprehensive development 
                     environment.

Upvotes: 1

Related Questions