Emmanuel Eleyae
Emmanuel Eleyae

Reputation: 15

Get all attributes for elements in XML file

I'm trying to parse a file and get all of the attributes for each <row> tag in the file. The file looks generally like this:

<?xml version="1.0" standalone="yes"?>
<report>
  <table>
    <columns>
      <column name="month"/>
      <column name="campaign"/>
      <!-- many columns -->
    </columns>
    <rows>
  <row month="December 2009" campaign="Campaign #1" 
       adgroup="Python" preview="Not available" 
       headline="We Write Apps in Python" 
       and="many more attributes here" />
  <row month="December 2009" campaign="Campaign #1" 
       adgroup="Ruby" preview="Not available" 
       headline="We Write Apps in Ruby" 
       and="many more attributes here" />
  <!-- many such rows -->
</rows></table></report>

Here is the full file: http://pastie.org/7268456#2.

I've looked at every tutorial and answer I can find on various help boards but they all assume the same thing- I'm searching for one or two specific tags and just need one or two values for those tags. I actually have 18 attributes for each <row> tag and I have a mysql table with a column for each of the 18 attributes. I need to put the information into an object/hash/array that I can use to insert into the table with ActiveRecord/Ruby.

I started out using Hpricot; you can see the code (which is not relevant) in the edit history of this question.

Upvotes: 0

Views: 2034

Answers (1)

Phrogz
Phrogz

Reputation: 303198

require 'nokogiri'
doc = Nokogiri.XML(my_xml_string)
doc.css('row').each do |row|
  # row is a Nokogiri::XML::Element
  row.attributes.each do |name,attr|
     # name is a string
     # attr is a Nokogiri::XML::Attr
    p name => attr.value
  end
end
#=> {"month"=>"December 2009"}
#=> {"campaign"=>"Campaign #1"}
#=> {"adgroup"=>"Python"}
#=> {"preview"=>"Not available"}
#=> {"headline"=>"We Write Apps in Python"}
#=> etc.

Alternatively, if you just want an array of hashes mapping attribute names to string values:

rows = doc.css('row').map{ |row| Hash[ row.attributes.map{|n,a| [n,a.value]} ] }
#=> [
#=>  {"month"=>"December 2009", "campaign"=>"Campaign #1", adgroup="Python", … },
#=>  {"month"=>"December 2009", "campaign"=>"Campaign #1", adgroup="Ruby", … },
#=>  …
#=> ]

The Nokogiri.XML method is the simplest way to parse an XML string and get a Nokogiri::Document back.

The css method is the simplest way to find all the elements with a given name (ignoring their containment hierarchy and any XML namespaces). It returns a Nokogiri::XML::NodeSet, which is very similar to an array.

Each Nokogiri::XML::Element has an attributes method that returns a Hash mapping the name of the attribute to a Nokogiri::XML::Attr object containing all the information about the attribute (name, value, namespace, parent element, etc.)

Upvotes: 2

Related Questions