Feed URL from HTML using Python

Question

The RSS feed URL is available a site's meta data (if one available). Is there a way to extract the feed URL(S) of a page using urllib2 or HTMLParser modules? Or is there a better module available?

Thanks.

Zach Kelling · Accepted Answer

I prefer lxml. It has a very nice API, and it's XPath support makes this fairly simple to accomplish:

import lxml.html
doc = lxml.html.parse(url_to_site)
feeds = doc.xpath('//link[@type="application/rss+xml"]/@href') # list feed urls

Feed URL from HTML using Python

Answers (1)

Related Questions