T Chowdhury
T Chowdhury

Reputation: 21

How do I use Nokogiri to access URLs from within an Atom feed?

I am trying to extract the URLs from YouTube's XML Atom feed using Nokogiri.

I had some luck extracting the id's, yay namespaces, but have had a hard time extracting the URLs. For example, YouTube's API offers three different <media:thumbnail> tags and three different <media:content> tags. You can see below that the URLs are not showing up for either of those tags. My goal is to extract the URLs from the first <media:thumbnail> and <media:content> respectively.

Here's a pastie of my code: http://pastie.org/1881669

This is the output in terminal for one entry:

{:group=>\"ComedyThe OMG Cat or the WTF cat - funny gobsmacked cat. The cats name is \\\"Choco\\\" and if i told you what she was looking at, I would have to kill you!!!The OMG Cat, omg cat, wtf cat, cat, cats, cat fail, the wtf cat, cute cats, cute animals, funny cats, funny cat video, omg, wtf, gobsmacked cat, gobsmacked, two girls one cup, reactionThe OMG Cat\", :category=>\"Comedy\", :content=>\"\", :description=>\"The OMG Cat or the WTF cat - funny gobsmacked cat. The cats name is \\\"Choco\\\" and if i told you what she was looking at, I would have to kill you!!!\", :keywords=>\"The OMG Cat, omg cat, wtf cat, cat, cats, cat fail, the wtf cat, cute cats, cute animals, funny cats, funny cat video, omg, wtf, gobsmacked cat, gobsmacked, two girls one cup, reaction\", :player=>\"\", :thumbnail=>\"\", :title=>\"The OMG Cat\"}]"

Upvotes: 0

Views: 615

Answers (1)

Danny
Danny

Reputation: 4124

Starting from the beginning, do (this particular case is looking at a youtube playlist xml feed, but I believe you can do the same for any video feed):

pid='5ABDCC8D096B0853' #requires a playlist id to lookup all its entries
 => "5ABDCC8D096B0853" 
doc = Nokogiri::XML(open("http://gdata.youtube.com/feeds/api/playlists/#{pid}?v=2"))   

Now you have the nokogiri xml document contained in the doc variable. From there, you can get all the media:content and media:thumbnail nodesets. Once you get a nodeset, you can access the first just like the first element in an array.

doc.xpath('//media:content')[0]
 => #<Nokogiri::XML::Element:0x82dd58d8 name="content" namespace=#<Nokogiri::XML::Namespace:0x82dd5ea0 prefix="media" href="http://search.yahoo.com/mrss/"> attributes=[#<Nokogiri::XML::Attr:0x82dd5770 name="url" value="http://www.youtube.com/ep.swf?id=5ABDCC8D096B0853">, #<Nokogiri::XML::Attr:0x82dd575c name="type" value="application/x-shockwave-flash">, #<Nokogiri::XML::Attr:0x82dd5748 name="format" namespace=#<Nokogiri::XML::Namespace:0x82dd3d44 prefix="yt" href="http://gdata.youtube.com/schemas/2007"> value="5">]> 

doc.xpath('//media:content')[0]['url']

 => "http://www.youtube.com/ep.swf?id=5ABDCC8D096B0853" 

and do the same for thumbnail:

doc.xpath('//media:thumbnail')[0]['url']
 => "http://i.ytimg.com/vi/eBefgm7hdpU/default.jpg" 

Upvotes: 3

Related Questions