Winston
Winston

Reputation: 225

How to find out the exact RSS XML path of a website?

How do I get the exact feed.xml/rss.xml/atom.xml path of a website?

For example, I supplied "http://www.example.com/news/today/this_is_a_news", but the rss is pointing to "http://www.example.com/rss/feed.xml", most modern browsers have this features already and I'm curious how did they get them.

Can you cite an example code in ruby, python or bash?

Upvotes: 0

Views: 2012

Answers (3)

user2852263
user2852263

Reputation: 577

In python use this classic solution: http://www.aaronsw.com/2002/feedfinder/

Upvotes: 0

yabt
yabt

Reputation: 11

You may also use a command line tool like xmlstarlet (together with HTML tidy):

# version 1
curl -s http://stackoverflow.com/questions/2441954/how-to-find-out-the-exact-rss-xml-path-of-a-website | 
tidy -q -c -wrap 0 -numeric -asxml -utf8 --merge-divs yes --merge-spans yes 2>/dev/null |
xmlstarlet sel -T -t -m "//*[local-name()='link']" --if "@type='application/atom+xml' or @type='application/rss+xml'" -m "@href" -v '.' -n

# version 2
curl -s http://stackoverflow.com/questions/2441954/how-to-find-out-the-exact-rss-xml-path-of-a-website | 
tidy -q -c -wrap 0 -numeric -asxml -utf8 --merge-divs yes --merge-spans yes 2>/dev/null |
xmlstarlet sel -N x="http://www.w3.org/1999/xhtml" -T -t -m "//x:link[@type='application/atom+xml' or @type='application/rss+xml']" -v "@href" -n

Upvotes: 1

the Tin Man
the Tin Man

Reputation: 160621

Something like this in Ruby will work...

require 'rubygems'
require 'nokogiri'
require 'open-uri'

html = Nokogiri::HTML(open('http://stackoverflow.com/questions/2441954/how-to-find-out-the-exact-rss-xml-path-of-a-website'))
puts html.css('link[type="application/atom+xml"]').first.attr('href')
#  => "/feeds/question/2441954"

Notice it's an absolute URL path, which is legal so you'd need to prepend the host info.

Also, "application/atom+xml" could also be "application/rss+xml" or "application/rdf+xml", and multiple links can be found in a page so you'll need to decide how to handle multiples. According to the autodiscovery docs the first one presented should be the preferred one, but from experience I've seen otherwise. Also, according to the docs the links should not be alternate data types (RSS and ATOM pointing to the same content) but should be different content, but again, I've seen that happen.

Upvotes: 2

Related Questions