Hugo Sousa
Hugo Sousa

Reputation: 916

Asynchronous feedparser requests

I'm using feedparser (Python) to get some RSS entries from several websites.

How can I do asynchronous requests using feedparser? I mean, I want to get some RSS entries but I don't want to wait for the response. A callback function should be called when I get the response from the feedparser request. After the request (and probably before the reply) I want to do some computation.

Thank you all, Hugo

Upvotes: 4

Views: 3522

Answers (2)

PirateApp
PirateApp

Reputation: 6212

2019 update

Use asyncio

import aiohttp
import asyncio
import async_timeout
import feedparser

import pprint

INTERVAL = 60

async def fetch(session, url):
    with async_timeout.timeout(10):
        async with session.get(url) as response:
            return await response.text()

async def fetchfeeds(loop, feedurls, ircsock):
    last_entry = None

    feeds = []

    for url in feedurls:
        feeds.append({'url':url, 'last':""})

    while True:
        for feed in feeds:
            async with aiohttp.ClientSession(loop=loop) as session:
                html = await fetch(session, feed['url'])
                rss = feedparser.parse(html)
                if feed['last']:
                    if feed['last']['title'] != rss['entries'][0]['title'] and feed['last']['link'] != rss['entries'][0]['link']:
                        print("new entry")
                        feed['last'] = rss['entries'][0]

                        print("MSG {}".format(feed['last']['title']))
                        print("MSG {}".format(feed['last']['link']))
                else:
                    feed['last'] = rss['entries'][0]

        await asyncio.sleep(INTERVAL)

loop = asyncio.get_event_loop()
loop.run_until_complete(fetchfeeds(loop, ['https://n-o-d-e.net/rss/rss.xml',
    "http://localhost:8000/rss.xml"], None))

Upvotes: 4

Julien Genestoux
Julien Genestoux

Reputation: 33012

You're probably better off to decouple the fetching from the parsing. Feedparser is an amazing parsing library, but probably not the best HTTP client libary. Luckily that's fairly easy to do as Feedparser can also parse a blob of text.

Then, this means you can pick any HTTP library to actually do the polling, as long as it supports your asyncrhonous requirement. You'll probably end up using something like Twisted and its WebClient.

Another solution is to of course avoid doing all that expensive polling yourself and rely on a solution like Superfeedr which will use webhooks to send you only what's new in a given feed.

Upvotes: 5

Related Questions