Mo J. Mughrabi
Mo J. Mughrabi

Reputation: 7007

Building blogs RSS feeds using Django (Python)

as titled, am trying to build a small application that will aggregate RSS from different blogs. Am trying to test out and explore feedparser for this operation, am stuck though trying to write a peace of code that would detect the rss feed.

Most people would just enter www.mysite.com/blog which is not exactly the URL to the RSS feed. If there a way for me to detect the RSS feed, am trying to replicate the browser behavior where it can see the RSS URL.

any ideas?

Upvotes: 0

Views: 792

Answers (3)

marius_5
marius_5

Reputation: 501

There is a great app exactly for this, is called Feedjack

But you will find yourself banging your head to wall when the RSS feed will contain less than 100 chars.

For full control (aggregating exactly what you need) and for websites without any RSS feeds I would recommend Scrapy

Upvotes: 0

Chris Pratt
Chris Pratt

Reputation: 239470

Use something like BeautifulSoup to parse the HTML document and look for the RSS feeds. The following is a basic example and not necessarily the most efficient:

from bs4 import BeautifulSoup
soup = BeautifulSoup(html_doc)

rss_links = soup.select('link[type="application/rss+xml"]')
for link in rss_links:
    rss_url = link.get('href')

See the full BeautifulSoup documentation.

Upvotes: 1

Martijn Pieters
Martijn Pieters

Reputation: 1125398

Browsers use RSS feed auto-discovery and Atom feed auto-discovery to find feeds on a given web page.

For example, the question lists are available via an Atom feed which is linked in the HTML header of the associated pages with:

<link rel="alternate" type="application/atom+xml" title="Feed of questions tagged python" href="/feeds/tag/python" />

You'll need to parse out the <link rel="alternate"> tags in a given page to discover these; anything with an application/atom+xml or application/rss+xml type fits.

Upvotes: 1

Related Questions