bdhar
bdhar

Reputation: 22975

Feed URL from HTML using Python

The RSS feed URL is available a site's meta data (if one available). Is there a way to extract the feed URL(S) of a page using urllib2 or HTMLParser modules? Or is there a better module available?

Thanks.

Upvotes: 1

Views: 214

Answers (1)

Zach Kelling
Zach Kelling

Reputation: 53819

I prefer lxml. It has a very nice API, and it's XPath support makes this fairly simple to accomplish:

import lxml.html
doc = lxml.html.parse(url_to_site)
feeds = doc.xpath('//link[@type="application/rss+xml"]/@href') # list feed urls

Upvotes: 2

Related Questions