Reputation: 478
I have an XML file:
?xml version="1.0" encoding="iso-8859-1"?>
<Offers xmlns:xsi="http://www.w3.org/2001/XMLSchema-instance" xsi:noNamespaceSchemaLocation="http://ssc.channeladvisor.com/files/cageneric.xsd">
<Offer>
<Model><![CDATA[11016001]]></Model>
<Manufacturer><![CDATA[Crocs, Inc.]]></Manufacturer>
<ManufacturerModel><![CDATA[11016-001]]></ManufacturerModel>
...lots more nodes
<Custom6><![CDATA[<li>Bold midsole stripe for a sporty look.</li>
<li>Odor-resistant, easy to clean, and quick to dry.</li>
<li>Ventilation ports for enhanced breathability.</li>
<li>Lightweight, non-marking soles.</li>
<li>Water-friendly and buoyant; weighs only ounces.</li>
<li>Fully molded Croslite™ material for lightweight cushioning and comfort.</li>
<li>Heel strap swings back for snug fit, forward for wear as a clog.</li>]]></Custom6>
</Offer>
....lots lots more <Offer> entries
</Offers>
I want to parse each instance of 'Offer' into its own row in a CSV file:
require 'csv'
require 'nokogiri'
file = File.read('input.xml')
doc = Nokogiri::XML(file)
a = []
csv = CSV.open('output.csv', 'wb')
doc.css('Offer').each do |node|
a.push << node.content.split
end
a.each { |a| csv << a }
This runs nicely except I'm splitting on whitespace rather than each element of the Offer node so every word is going into its own column in the CSV file.
Is there a way to pick up the content of each node and how do I use the node names as headers in the CSV file?
Upvotes: 2
Views: 4090
Reputation: 14051
Try this, and modify it to push into your CSV:
doc.css('Offer').first.elements.each do |n|
puts "#{n.name}: #{n.content}"
end
Upvotes: 0
Reputation: 7561
This assumes that each Offer
element always has the same child nodes (though they can be empty):
CSV.open('output.csv', 'wb') do |csv|
doc.search('Offer').each do |x|
csv << x.search('*').map(&:text)
end
end
And to get headers (from the first Offer
element):
CSV.open('output.csv', 'wb') do |csv|
csv << doc.at('Offer').search('*').map(&:name)
doc.search('Offer').each do |x|
csv << x.search('*').map(&:text)
end
end
search
and at
are Nokogiri functions that can take either XPath or CSS selector strings. at
will return the first occurrence of an element; search
will provide an array of matching elements (or an empty array if no matches are found). The *
in this case will select all nodes that are direct children of the current node.
Both name
and text
are also Nokogiri functions (for an element). name
provides the element's name; text
provides the text or CDATA content of a node.
Upvotes: 6