How can extract multiple elements from a XML tree with a single #each iteration

Question

My xml looks something like this:


  2012-11-14T13:58:49-07:00
  Some text
  Angry Birds Star Wars - Rovio Entertainment Ltd


  2012-11-14T13:58:49-07:00
  Some text
  Angry Birds Star Wars - Rovio Entertainment Ltd

I would like use Nokogiri to grab some data from the xml. Namely I am interested in the im:id, im:bundleId and the </code> from the xml above. I have managed to get to the stage where this works: <pre><code>xml.css("entry id").each do |entry| puts entry["im:id"] puts entry["im:bundleid"] end </code></pre> The problems is that to get <code>title</code> content I would have to iterate through <code>xml.css("entry title")</code> separately. Is there anyway of iterating through the entries and then pulling out the <code>id</code> data and the <code>title</code> in the same loop?

the Tin Man · Accepted Answer

First, your example XML isn't correctly nested, so that needs to be fixed:


  
    2012-11-14T13:58:49-07:00
    Some text
    Angry Birds Star Wars - Rovio Entertainment Ltd
  
  
    2012-11-14T13:58:49-07:00
    Some text
    Angry Birds Star Wars - Rovio Entertainment Ltd

Then, this works:

require 'nokogiri'
require 'pp'

doc = Nokogiri::XML(<
  
    2012-11-14T13:58:49-07:00
    Some text
    Angry Birds Star Wars - Rovio Entertainment Ltd
  
  
    2012-11-14T13:58:49-07:00
    Some text
    Angry Birds Star Wars - Rovio Entertainment Ltd
  

EOT

pp doc.search('entry').map{ |e|
  id = e.at('id')
  [
    id['id'],
    id['bundleId'],
    e.at('title').text
  ]
}

Which looks like:

[["557137623",
  "com.rovio.angrybirdsstarwars",
  "Angry Birds Star Wars - Rovio Entertainment Ltd"],
["557137623",
  "com.rovio.angrybirdsstarwars",
  "Angry Birds Star Wars - Rovio Entertainment Ltd"]]

This works because I'm walking through the entry tags. For each entry, I look for the id tag, and remember it, making it easy to look into it repeatedly for the id and bundleID parameters. Then it's a simple case of looking inside e for the title tag.

I'm sure it could be done using some funky XPath, but I'm mortal and like to keep it simple.

How can extract multiple elements from a XML tree with a single #each iteration

Answers (1)

Related Questions