Reputation: 15
I'm trying to parse a file and get all of the attributes for each <row>
tag in the file. The file looks generally like this:
<?xml version="1.0" standalone="yes"?>
<report>
<table>
<columns>
<column name="month"/>
<column name="campaign"/>
<!-- many columns -->
</columns>
<rows>
<row month="December 2009" campaign="Campaign #1"
adgroup="Python" preview="Not available"
headline="We Write Apps in Python"
and="many more attributes here" />
<row month="December 2009" campaign="Campaign #1"
adgroup="Ruby" preview="Not available"
headline="We Write Apps in Ruby"
and="many more attributes here" />
<!-- many such rows -->
</rows></table></report>
Here is the full file: http://pastie.org/7268456#2.
I've looked at every tutorial and answer I can find on various help boards but they all assume the same thing- I'm searching for one or two specific tags and just need one or two values for those tags. I actually have 18 attributes for each <row>
tag and I have a mysql table with a column for each of the 18 attributes. I need to put the information into an object/hash/array that I can use to insert into the table with ActiveRecord/Ruby.
I started out using Hpricot; you can see the code (which is not relevant) in the edit history of this question.
Upvotes: 0
Views: 2034
Reputation: 303198
require 'nokogiri'
doc = Nokogiri.XML(my_xml_string)
doc.css('row').each do |row|
# row is a Nokogiri::XML::Element
row.attributes.each do |name,attr|
# name is a string
# attr is a Nokogiri::XML::Attr
p name => attr.value
end
end
#=> {"month"=>"December 2009"}
#=> {"campaign"=>"Campaign #1"}
#=> {"adgroup"=>"Python"}
#=> {"preview"=>"Not available"}
#=> {"headline"=>"We Write Apps in Python"}
#=> etc.
Alternatively, if you just want an array of hashes mapping attribute names to string values:
rows = doc.css('row').map{ |row| Hash[ row.attributes.map{|n,a| [n,a.value]} ] }
#=> [
#=> {"month"=>"December 2009", "campaign"=>"Campaign #1", adgroup="Python", … },
#=> {"month"=>"December 2009", "campaign"=>"Campaign #1", adgroup="Ruby", … },
#=> …
#=> ]
The Nokogiri.XML
method is the simplest way to parse an XML string and get a Nokogiri::Document
back.
The css
method is the simplest way to find all the elements with a given name (ignoring their containment hierarchy and any XML namespaces). It returns a Nokogiri::XML::NodeSet
, which is very similar to an array.
Each Nokogiri::XML::Element
has an attributes
method that returns a Hash mapping the name of the attribute to a Nokogiri::XML::Attr
object containing all the information about the attribute (name, value, namespace, parent element, etc.)
Upvotes: 2