Reputation: 1272
http://www.mdr.de/export/sandmann/folgen/sandmann612-mediaRss_doca-1_zc-1a3071ad.xml
returns, besides others, these lines:
(...)
<media:content url="http://x4100mp4dynonlc22033.f.o.l.lb.core-cdn.net/22033mdr/ondemand/4100mp4dynonl/FCMS-066eb3e7-81b2-4dae-898d-4963137eb4b6-e9ebd6e42ce1.mp4" type="video/mpeg" expression="full" width="512" height="288" bitrate="512" duration="398" />
<media:content url="http://x4100mp4dynonlc22033.f.o.l.lb.core-cdn.net/22033mdr/ondemand/4100mp4dynonl/FCMS-066eb3e7-81b2-4dae-898d-4963137eb4b6-c7cca1d51b4b.mp4" type="video/mpeg" expression="full" width="960" height="544" bitrate="1536" duration="398" />
(...)
How would I tell Nokogiri to extract only the line where bitrate="1536"
?
I'd actually just need the URL within that XPath, so I expect (I find it rather rude to write "expect" here, but I was told to do so ;) the following string returned:
http://x4100mp4dynonlc22033.f.o.l.lb.core-cdn.net/22033mdr/ondemand/4100mp4dynonl/FCMS-066eb3e7-81b2-4dae-898d-4963137eb4b6-c7cca1d51b4b.mp4
If someone is interested, this will allow me to download the daily episode of the Sandmännchen, a german TV miniseries for Little kids. :)
So far I have tried using simpleRSS
with this:
(...)
rss.entries.each do |entry|
pp entry
end
But that only returns the first item of the media:group
"set" of links:
{:title=>"Sandmann vom 14. Oktober 2012",
:link=>"http://www.mdr.de/export/sandmann/folgen/video78338.html",
:description=>
"Die j\xC3\xBCngste Geschichte vom Sandmann gibt es f\xC3\xBCr 24 Stunden hier auf Abruf. Heute: Molly mag keine Schuhe. Das finden die anderen Monster merkw\xC3\xBCrdig, weil Monster Schuhe lieben.",
:pubDate=>2012-09-19 14:54:43 +0200,
:guid=>
"mp4:4100mp4dynonl/FCMS-066eb3e7-81b2-4dae-898d-4963137eb4b6-8442e17c3177",
:media_content_url=>
"rtmp://x4100mp4dynonlc22033.f.o.f.lb.core-cdn.net/22033mdr/ondemand",
:media_content_type=>"fms/h264",
:media_content_height=>"272",
:media_content_width=>"480",
:media_title=>"Sandmann vom 14. Oktober 2012",
:media_thumbnail_url=>
"http://www.mdr.de/export/sandmann/folgen/sandmann864_v-standard43_zc-698fff06.jpg",
:media_thumbnail_height=>"135",
:media_thumbnail_width=>"180"}
Upvotes: 1
Views: 282
Reputation: 303224
require 'nokogiri'
require 'open-uri'
url = 'http://www.mdr.de/export/sandmann/folgen/sandmann612-mediaRss_doca-1_zc-1a3071ad.xml'
doc = Nokogiri.XML(open(url))
doc.remove_namespaces! # Just to make our life simpler
content = doc.at_css('content[bitrate="1536"]')
puts content['url']
#=> http://x4100mp4dynonlc22033.f.o.l.lb.core-cdn.net/22033mdr/ondemand/4100mp4dynonl/FCMS-fd2af820-ec90-4f34-a58e-db1b9fdcc25a-c7cca1d51b4b.mp4
Upvotes: 0
Reputation: 54984
To make it easy, simply:
doc.at('content[@bitrate="1536"]')[:url]
Upvotes: 0
Reputation: 27374
How about this:
doc.at_xpath('//media:content[@bitrate="1536"]/@url').text
#=> "http://www.mdr.de/export/sandmann/folgen/sandmann612-mediaRss__zc-1a3071ad.xml"
The link by the way doesn't work, so I wasn't actually able to test this on the full document.
UPDATE:
Using the info from your answer below, in nokogiri:
filme = Nokogiri::XML(open('http://www.sandmann.de/static/san/app/filme.xml'))
folge = Nokogiri::XML(open(filme.xpath('//filme/folge').text))
folge.at_xpath('//media:content[@bitrate="1536"]/@url').text
#=> "http://x4100mp4dynonlc22033.f.o.l.lb.core-cdn.net/22033mdr/ondemand/4100mp4dynonl/FCMS-066eb3e7-81b2-4dae-898d-4963137eb4b6-c7cca1d51b4b.mp4"
Upvotes: 1
Reputation: 1272
This is what I came up with in the end - no nokogiri
(which, I assume, is very powerful but has a rather steep learning curve. Plus, I simply don't understand it...) but crack
instead. It seems to be more rubyish and plays along nicely with the MRSS feed I am getting:
require 'rubygems'
require 'pp'
require 'crack'
require 'asciify'
require 'open-uri'
fileurl = ""
filme = Crack::XML.parse(open('http://www.sandmann.de/static/san/app/filme.xml'))
folge = Crack::XML.parse(open(filme['filme']['folge']))
titel = folge['rss']['channel']['item']['description'].to_s.sub(/.*Die jüngste Geschichte vom Sandmann gibt es für 24 Stunden hier auf Abruf. Heute: /, '')
folge['rss']['channel']['item']['media:group']['media:content'].each do |x|
fileurl << x['url'] if x['bitrate'] == "1536"
end
filename = titel.split(".").first.asciify + ".m4v"
filename.gsub!(" ","_")
system("curl -o \"#{filename}\" \"#{fileurl}\"")
Just in case your kids want to watch, too ;)
Upvotes: 0