gorlaz
gorlaz

Reputation: 478

How to parse XML nodes to CSV with Ruby and Nokogiri

I have an XML file:

?xml version="1.0" encoding="iso-8859-1"?>
<Offers xmlns:xsi="http://www.w3.org/2001/XMLSchema-instance" xsi:noNamespaceSchemaLocation="http://ssc.channeladvisor.com/files/cageneric.xsd">
  <Offer>
   <Model><![CDATA[11016001]]></Model>
   <Manufacturer><![CDATA[Crocs, Inc.]]></Manufacturer>
   <ManufacturerModel><![CDATA[11016-001]]></ManufacturerModel>
   ...lots more nodes
   <Custom6><![CDATA[<li>Bold midsole stripe for a sporty look.</li>
    <li>Odor-resistant, easy to clean, and quick to dry.</li>
    <li>Ventilation ports for enhanced breathability.</li>
    <li>Lightweight, non-marking soles.</li>
    <li>Water-friendly and buoyant; weighs only ounces.</li>
    <li>Fully molded Croslite&trade; material for lightweight cushioning and comfort.</li>
    <li>Heel strap swings back for snug fit, forward for wear as a clog.</li>]]></Custom6>
  </Offer>
....lots lots more <Offer> entries
</Offers>

I want to parse each instance of 'Offer' into its own row in a CSV file:

require 'csv'
require 'nokogiri'

file = File.read('input.xml')
doc = Nokogiri::XML(file)
a = []
csv = CSV.open('output.csv', 'wb') 

doc.css('Offer').each do |node|
    a.push << node.content.split
end

a.each { |a| csv << a } 

This runs nicely except I'm splitting on whitespace rather than each element of the Offer node so every word is going into its own column in the CSV file.

Is there a way to pick up the content of each node and how do I use the node names as headers in the CSV file?

Upvotes: 2

Views: 4090

Answers (2)

Abdo
Abdo

Reputation: 14051

Try this, and modify it to push into your CSV:

doc.css('Offer').first.elements.each do |n|
  puts "#{n.name}: #{n.content}"
end

Upvotes: 0

Jacob Brown
Jacob Brown

Reputation: 7561

This assumes that each Offer element always has the same child nodes (though they can be empty):

CSV.open('output.csv', 'wb') do |csv|
  doc.search('Offer').each do |x|
    csv << x.search('*').map(&:text)
  end
end

And to get headers (from the first Offer element):

CSV.open('output.csv', 'wb') do |csv|
  csv << doc.at('Offer').search('*').map(&:name)
  doc.search('Offer').each do |x|
    csv << x.search('*').map(&:text)
  end
end

search and at are Nokogiri functions that can take either XPath or CSS selector strings. at will return the first occurrence of an element; search will provide an array of matching elements (or an empty array if no matches are found). The * in this case will select all nodes that are direct children of the current node.

Both name and text are also Nokogiri functions (for an element). name provides the element's name; text provides the text or CDATA content of a node.

Upvotes: 6

Related Questions