Reputation: 30699
I've got a dozen XML files which contain the results of some wcat web performance tests. Within each XML file there is a data node that contains the names of each page requested and the average time it took to load it. I want to extract that information from each XML file and output it to a csv file so I can create a nice pretty graph in excel.
I could do the task in my main working language of C# but in an attempt to improve my scripting skills I'd like to try and do it using unix/cygwin commands or a scripting language such as Ruby.
The format of the XML file is:
<report name="wcat" version="6.3.1" level="1" top="100">
<section name="header" key="90000">
... lots of other XML junk...
<item>
<data name="reportt" >Request Name I</data>
...
<data name="avgttlb" >628</data>
</item>
<item>
<data name="reportt" >Request Name II</data>
...
<data name="avgttlb" >793</data>
</item>
... lots of other XML junk...
</section
</report>
And the csv output I need is:
Request,File 1,File 2,...,File 12
Request Name I,628,123,...,789
Request Name II,793,456,...,987
Are there any good cygwin command line utilities that could parse the XML? Or failing that is there a nice way to do it in Ruby?
Upvotes: 1
Views: 2795
Reputation: 160551
Ruby has a nice parser called Nokogiri, that I really like. It supports both XML and HTML, DOM and SAX, and can build XML if that's your fancy. It's built on libxml2.
#!/usr/bin/env ruby -w
xml = <<END_XML
<report name="wcat" version="6.3.1" level="1" top="100">
<section name="header" key="90000">
<item>
<data name="reportt" >Request Name I</data>
<data name="avgttlb" >628</data>
</item>
<item>
<data name="reportt" >Request Name II</data>
<data name="avgttlb" >793</data>
</item>
</section
</report>
END_XML
require 'nokogiri'
doc = Nokogiri::XML(xml)
content = doc.search('item').map { |i|
i.search('data').map { |d| d.text }
}
content.each do |c|
puts c.join(',')
end
# >> Request Name I,628
# >> Request Name II,793
Notice that Nokogiri allows use of CSS accessors, which I'm using here, in addition to the standard XPath accessors. The actual parsing took the middle four lines.
Ruby's got a built-in CSV generator/parser, but for this quick 'n dirty example I didn't use it.
Upvotes: 1
Reputation: 3208
in python...
import elementTree.ElementTree
import csv
result = []
tree = elementTree.ElemenTree.parse('test.xml')
section = tree.getroot().find('section')
items = section.findall('item')
for item in items:
records = item.findall('data')
row = [rec.text for rec in records]
result.append(row)
csv.writer(file('output.csv', 'w'))
csv.writerows(result)
Upvotes: 1
Reputation: 43158
What you're describing could be done in XSLT, which supports text output method, multiple input files (using the document()
function), and of course templates.
I know some people find XSLT gross, but I use it all the time for this kind of thing and rather like it. Plus it's pretty much platform-independent.
Upvotes: 2