Reputation: 30699

Parse data from multiple XML files and output to csv file

I've got a dozen XML files which contain the results of some wcat web performance tests. Within each XML file there is a data node that contains the names of each page requested and the average time it took to load it. I want to extract that information from each XML file and output it to a csv file so I can create a nice pretty graph in excel.

I could do the task in my main working language of C# but in an attempt to improve my scripting skills I'd like to try and do it using unix/cygwin commands or a scripting language such as Ruby.

The format of the XML file is:

<report name="wcat" version="6.3.1" level="1" top="100">
 <section name="header" key="90000">
  ... lots of other XML junk...
  <item>
   <data name="reportt" >Request Name I</data>
   ...
   <data name="avgttlb" >628</data>
  </item>
  <item>
   <data name="reportt" >Request Name II</data>
   ...
   <data name="avgttlb" >793</data>
  </item>
  ... lots of other XML junk...
 </section
</report>

And the csv output I need is:

Request,File 1,File 2,...,File 12
Request Name I,628,123,...,789
Request Name II,793,456,...,987

Are there any good cygwin command line utilities that could parse the XML? Or failing that is there a nice way to do it in Ruby?

Upvotes: 1

Answers (3)

the Tin Man

Reputation: 160551

Ruby has a nice parser called Nokogiri, that I really like. It supports both XML and HTML, DOM and SAX, and can build XML if that's your fancy. It's built on libxml2.

#!/usr/bin/env ruby -w

xml = <<END_XML
<report name="wcat" version="6.3.1" level="1" top="100">
<section name="header" key="90000">
  <item>
    <data name="reportt" >Request Name I</data>
    <data name="avgttlb" >628</data>
  </item>
  <item>
    <data name="reportt" >Request Name II</data>
    <data name="avgttlb" >793</data>
  </item>
  </section
</report>
END_XML

require 'nokogiri'
doc = Nokogiri::XML(xml)
content = doc.search('item').map { |i| 
  i.search('data').map { |d| d.text }
}

content.each do |c|
  puts c.join(',')
end

# >> Request Name I,628
# >> Request Name II,793

Notice that Nokogiri allows use of CSS accessors, which I'm using here, in addition to the standard XPath accessors. The actual parsing took the middle four lines.

Ruby's got a built-in CSV generator/parser, but for this quick 'n dirty example I didn't use it.

Upvotes: 1

eat_a_lemon

Reputation: 3208

in python...

import elementTree.ElementTree
import csv

result = []
tree = elementTree.ElemenTree.parse('test.xml')
section = tree.getroot().find('section')
items = section.findall('item')
for item in items:
    records = item.findall('data')
    row = [rec.text for rec in records]
    result.append(row)

csv.writer(file('output.csv', 'w'))
csv.writerows(result)

Upvotes: 1

harpo

Reputation: 43158

What you're describing could be done in XSLT, which supports text output method, multiple input files (using the document() function), and of course templates.

I know some people find XSLT gross, but I use it all the time for this kind of thing and rather like it. Plus it's pretty much platform-independent.

Upvotes: 2

Parse data from multiple XML files and output to csv file

Answers (3)

Related Questions