Reputation: 21
I am trying to extract the URLs from YouTube's XML Atom feed using Nokogiri.
I had some luck extracting the id's, yay namespaces, but have had a hard time extracting the URLs. For example, YouTube's API offers three different <media:thumbnail>
tags and three different <media:content>
tags. You can see below that the URLs are not showing up for either of those tags. My goal is to extract the URLs from the first <media:thumbnail>
and <media:content>
respectively.
Here's a pastie of my code: http://pastie.org/1881669
This is the output in terminal for one entry:
{:group=>\"ComedyThe OMG Cat or the WTF cat - funny gobsmacked cat. The cats name is \\\"Choco\\\" and if i told you what she was looking at, I would have to kill you!!!The OMG Cat, omg cat, wtf cat, cat, cats, cat fail, the wtf cat, cute cats, cute animals, funny cats, funny cat video, omg, wtf, gobsmacked cat, gobsmacked, two girls one cup, reactionThe OMG Cat\", :category=>\"Comedy\", :content=>\"\", :description=>\"The OMG Cat or the WTF cat - funny gobsmacked cat. The cats name is \\\"Choco\\\" and if i told you what she was looking at, I would have to kill you!!!\", :keywords=>\"The OMG Cat, omg cat, wtf cat, cat, cats, cat fail, the wtf cat, cute cats, cute animals, funny cats, funny cat video, omg, wtf, gobsmacked cat, gobsmacked, two girls one cup, reaction\", :player=>\"\", :thumbnail=>\"\", :title=>\"The OMG Cat\"}]"
Upvotes: 0
Views: 615
Reputation: 4124
Starting from the beginning, do (this particular case is looking at a youtube playlist xml feed, but I believe you can do the same for any video feed):
pid='5ABDCC8D096B0853' #requires a playlist id to lookup all its entries
=> "5ABDCC8D096B0853"
doc = Nokogiri::XML(open("http://gdata.youtube.com/feeds/api/playlists/#{pid}?v=2"))
Now you have the nokogiri xml document contained in the doc variable. From there, you can get all the media:content and media:thumbnail nodesets. Once you get a nodeset, you can access the first just like the first element in an array.
doc.xpath('//media:content')[0]
=> #<Nokogiri::XML::Element:0x82dd58d8 name="content" namespace=#<Nokogiri::XML::Namespace:0x82dd5ea0 prefix="media" href="http://search.yahoo.com/mrss/"> attributes=[#<Nokogiri::XML::Attr:0x82dd5770 name="url" value="http://www.youtube.com/ep.swf?id=5ABDCC8D096B0853">, #<Nokogiri::XML::Attr:0x82dd575c name="type" value="application/x-shockwave-flash">, #<Nokogiri::XML::Attr:0x82dd5748 name="format" namespace=#<Nokogiri::XML::Namespace:0x82dd3d44 prefix="yt" href="http://gdata.youtube.com/schemas/2007"> value="5">]>
doc.xpath('//media:content')[0]['url']
=> "http://www.youtube.com/ep.swf?id=5ABDCC8D096B0853"
and do the same for thumbnail:
doc.xpath('//media:thumbnail')[0]['url']
=> "http://i.ytimg.com/vi/eBefgm7hdpU/default.jpg"
Upvotes: 3