Ken W
Ken W

Reputation: 962

Rails how do I parse link tags from a user-entered url

I want to be able to look to see whether the page at a user-entered url contains something similar to:

<link rel="alternate" type="application/rss+xml" ... href="http://feeds.example.com/MyBlog"/>

that way I can eliminate one option of parsing for an atom or rss feed url.

is there any good way of doing this? do I have to make my server parse the entire html of the user's url and muck through all of it?

I would need the url in a variable to use after parsing

Upvotes: 0

Views: 432

Answers (2)

Ramfjord
Ramfjord

Reputation: 939

I believe you will indeed have to parse through all of it, because there's no way to get any of it besides getting all of it with a single http request. For this, you can use Ruby's Net:HTTP class as follows:

require 'net/http'

url = URI.parse('http://www.example.com/index.html')
req = Net::HTTP::Get.new(url.path)
res = Net::HTTP.start(url.host, url.port) {|http|
  http.request(req)
}

# regex below grabs all the hrefs on link tags
# print all the matches
res.body.scan(/<link[^>]*href\s*=\s*["']([^"']*)/).each {|match| 
  puts match
}

Upvotes: 0

user1678885
user1678885

Reputation: 56

You could use the Nokogiri gem - http://www.nokogiri.org/

Here's an example using their css-style document searching syntax:

require 'nokogiri'
require 'open-uri'

document = Nokogiri::HTML(open('http://www.example.com/'))
rss_xml_nodes = doc.css('link[rel="alternate"][type="application/rss+xml"]')
rss_xml_hrefs = rss_xml_nodes.collect { |node| node[:href] }

rss_xml_nodes will contain an array of Nokogiri XML elements

rss_xml_hrefs will contain an array of strings containing the nodes' href attributes

rss_xml_nodes.empty?
=> false

rss_xml_hrefs
=> ["http://www.example.com/rss-feed.xml", "http://www.example.com/rss-feed2.xml"] 

Upvotes: 2

Related Questions