Reputation: 962
I want to be able to look to see whether the page at a user-entered url contains something similar to:
<link rel="alternate" type="application/rss+xml" ... href="http://feeds.example.com/MyBlog"/>
that way I can eliminate one option of parsing for an atom or rss feed url.
is there any good way of doing this? do I have to make my server parse the entire html of the user's url and muck through all of it?
I would need the url in a variable to use after parsing
Upvotes: 0
Views: 432
Reputation: 939
I believe you will indeed have to parse through all of it, because there's no way to get any of it besides getting all of it with a single http request. For this, you can use Ruby's Net:HTTP class as follows:
require 'net/http'
url = URI.parse('http://www.example.com/index.html')
req = Net::HTTP::Get.new(url.path)
res = Net::HTTP.start(url.host, url.port) {|http|
http.request(req)
}
# regex below grabs all the hrefs on link tags
# print all the matches
res.body.scan(/<link[^>]*href\s*=\s*["']([^"']*)/).each {|match|
puts match
}
Upvotes: 0
Reputation: 56
You could use the Nokogiri gem - http://www.nokogiri.org/
Here's an example using their css-style document searching syntax:
require 'nokogiri'
require 'open-uri'
document = Nokogiri::HTML(open('http://www.example.com/'))
rss_xml_nodes = doc.css('link[rel="alternate"][type="application/rss+xml"]')
rss_xml_hrefs = rss_xml_nodes.collect { |node| node[:href] }
rss_xml_nodes will contain an array of Nokogiri XML elements
rss_xml_hrefs will contain an array of strings containing the nodes' href attributes
rss_xml_nodes.empty?
=> false
rss_xml_hrefs
=> ["http://www.example.com/rss-feed.xml", "http://www.example.com/rss-feed2.xml"]
Upvotes: 2