Reputation: 102
My xml looks something like this:
<entry>
<updated>2012-11-14T13:58:49-07:00</updated>
<id im:id="557137623" im:bundleId="com.rovio.angrybirdsstarwars">Some text</id>
<title>Angry Birds Star Wars - Rovio Entertainment Ltd</title>
</entry>
<entry>
<updated>2012-11-14T13:58:49-07:00</updated>
<id im:id="557137623" im:bundleId="com.rovio.angrybirdsstarwars">Some text</id>
<title>Angry Birds Star Wars - Rovio Entertainment Ltd</title>
</entry>
I would like use Nokogiri to grab some data from the xml. Namely I am interested in the im:id
, im:bundleId
and the <title>
from the xml above.
I have managed to get to the stage where this works:
xml.css("entry id").each do |entry|
puts entry["im:id"]
puts entry["im:bundleid"]
end
The problems is that to get title
content I would have to iterate through xml.css("entry title")
separately. Is there anyway of iterating through the entries and then pulling out the id
data and the title
in the same loop?
Upvotes: 0
Views: 242
Reputation: 160571
First, your example XML isn't correctly nested, so that needs to be fixed:
<root>
<entry>
<updated>2012-11-14T13:58:49-07:00</updated>
<id im:id="557137623" im:bundleId="com.rovio.angrybirdsstarwars">Some text</id>
<title>Angry Birds Star Wars - Rovio Entertainment Ltd</title>
</entry>
<entry>
<updated>2012-11-14T13:58:49-07:00</updated>
<id im:id="557137623" im:bundleId="com.rovio.angrybirdsstarwars">Some text</id>
<title>Angry Birds Star Wars - Rovio Entertainment Ltd</title>
</entry>
</root>
Then, this works:
require 'nokogiri'
require 'pp'
doc = Nokogiri::XML(<<EOT)
<root>
<entry>
<updated>2012-11-14T13:58:49-07:00</updated>
<id im:id="557137623" im:bundleId="com.rovio.angrybirdsstarwars">Some text</id>
<title>Angry Birds Star Wars - Rovio Entertainment Ltd</title>
</entry>
<entry>
<updated>2012-11-14T13:58:49-07:00</updated>
<id im:id="557137623" im:bundleId="com.rovio.angrybirdsstarwars">Some text</id>
<title>Angry Birds Star Wars - Rovio Entertainment Ltd</title>
</entry>
</root>
EOT
pp doc.search('entry').map{ |e|
id = e.at('id')
[
id['id'],
id['bundleId'],
e.at('title').text
]
}
Which looks like:
[["557137623",
"com.rovio.angrybirdsstarwars",
"Angry Birds Star Wars - Rovio Entertainment Ltd"],
["557137623",
"com.rovio.angrybirdsstarwars",
"Angry Birds Star Wars - Rovio Entertainment Ltd"]]
This works because I'm walking through the entry
tags. For each entry
, I look for the id
tag, and remember it, making it easy to look into it repeatedly for the id
and bundleID
parameters. Then it's a simple case of looking inside e
for the title
tag.
I'm sure it could be done using some funky XPath, but I'm mortal and like to keep it simple.
Upvotes: 6