Stanislav Beremski
Stanislav Beremski

Reputation: 102

How can extract multiple elements from a XML tree with a single #each iteration

My xml looks something like this:

<entry>
  <updated>2012-11-14T13:58:49-07:00</updated>
  <id im:id="557137623" im:bundleId="com.rovio.angrybirdsstarwars">Some text</id>
  <title>Angry Birds Star Wars - Rovio Entertainment Ltd</title>
</entry>
<entry>
  <updated>2012-11-14T13:58:49-07:00</updated>
  <id im:id="557137623" im:bundleId="com.rovio.angrybirdsstarwars">Some text</id>
  <title>Angry Birds Star Wars - Rovio Entertainment Ltd</title>
</entry>

I would like use Nokogiri to grab some data from the xml. Namely I am interested in the im:id, im:bundleId and the <title> from the xml above.

I have managed to get to the stage where this works:

xml.css("entry id").each do |entry|
   puts entry["im:id"]
   puts entry["im:bundleid"]
end

The problems is that to get title content I would have to iterate through xml.css("entry title") separately. Is there anyway of iterating through the entries and then pulling out the id data and the title in the same loop?

Upvotes: 0

Views: 242

Answers (1)

the Tin Man
the Tin Man

Reputation: 160571

First, your example XML isn't correctly nested, so that needs to be fixed:

<root>
  <entry>
    <updated>2012-11-14T13:58:49-07:00</updated>
    <id im:id="557137623" im:bundleId="com.rovio.angrybirdsstarwars">Some text</id>
    <title>Angry Birds Star Wars - Rovio Entertainment Ltd</title>
  </entry>
  <entry>
    <updated>2012-11-14T13:58:49-07:00</updated>
    <id im:id="557137623" im:bundleId="com.rovio.angrybirdsstarwars">Some text</id>
    <title>Angry Birds Star Wars - Rovio Entertainment Ltd</title>
  </entry>
</root>

Then, this works:

require 'nokogiri'
require 'pp'

doc = Nokogiri::XML(<<EOT)
<root>
  <entry>
    <updated>2012-11-14T13:58:49-07:00</updated>
    <id im:id="557137623" im:bundleId="com.rovio.angrybirdsstarwars">Some text</id>
    <title>Angry Birds Star Wars - Rovio Entertainment Ltd</title>
  </entry>
  <entry>
    <updated>2012-11-14T13:58:49-07:00</updated>
    <id im:id="557137623" im:bundleId="com.rovio.angrybirdsstarwars">Some text</id>
    <title>Angry Birds Star Wars - Rovio Entertainment Ltd</title>
  </entry>
</root>
EOT

pp doc.search('entry').map{ |e|
  id = e.at('id')
  [
    id['id'],
    id['bundleId'],
    e.at('title').text
  ]
}

Which looks like:

[["557137623",
  "com.rovio.angrybirdsstarwars",
  "Angry Birds Star Wars - Rovio Entertainment Ltd"],
["557137623",
  "com.rovio.angrybirdsstarwars",
  "Angry Birds Star Wars - Rovio Entertainment Ltd"]]

This works because I'm walking through the entry tags. For each entry, I look for the id tag, and remember it, making it easy to look into it repeatedly for the id and bundleID parameters. Then it's a simple case of looking inside e for the title tag.

I'm sure it could be done using some funky XPath, but I'm mortal and like to keep it simple.

Upvotes: 6

Related Questions