timebandit
timebandit

Reputation: 830

How do I retrieve all RSS entries that are no more than X days old

I am using Python and the RSS feedparser module to retrieve RSS entries. However I only want to retrieve a news item if it is no more than x days old.

For example if x=4 then my Python code should not fetch anything four days older than the current date.

Feedparser allows you to scrape the 'published' date for the entry, however it is of type unicode and I don't know how to convert this into a datetime object.

Here is some example input:

date = 'Thu, 29 May 2014 20:39:20 +0000'

Here is what I have tried:

from datetime import datetime
date_object = datetime.strptime(date, '%a, %d %b %Y %H:%M:%S %z')

This is the error I get:

ValueError: 'z' is a bad directive in format '%a, %d %b %Y %H:%M:%S %z'

This is what I hope to do with it:

from datetime import datetime
a = datetime(today)
b = datetime(RSS_feed_entry_date)
>>> a-b
datetime.timedelta(6, 1)
(a-b).days
6

Upvotes: 0

Views: 1153

Answers (2)

heinst
heinst

Reputation: 8786

from datetime import datetime
date = 'Thu, 29 May 2014 20:39:20 +0000'
if '+' in date:
    dateSplit = date.split('+')
    offset = '+' + dateSplit[1]
    restOfDate = str(dateSplit[0])
date_object = datetime.strptime(restOfDate + ' ' + offset, '%a, %d %b %Y %H:%M:%S ' + offset)
print date_object

Yields 2014-05-29 20:39:20, as I was researching your timezone error I came across this other SO question that says that strptime has trouble with time zones (link to question).

Upvotes: 1

Chris Clarke
Chris Clarke

Reputation: 2181

For this, you already have a time.struct_time look at feed.entries[0].published_parsed

you can use time.mktime to convert this to a timestamp and compare it with time.time() to see how far in the past it is:

An example:

>>> import feedparser
>>> import time

>>> f = feedparser.parse("http://feeds.bbci.co.uk/news/rss.xml")
>>> f.entries[0].published_parsed
time.struct_time(tm_year=2014, tm_mon=5, tm_mday=30, tm_hour=14, tm_min=6, tm_sec=8, tm_wday=4, tm_yday=150, tm_isdst=0)

>>> time.time() - time.mktime(feed.entries[0].published_parsed)
4985.511506080627

obviosuly this will be a different value for you, but if this is less than (in your case) 86400 * 4 (number of seconds in 4 days), it's what you want.

So, concisely

[entry for entry in f.entries if time.time() - time.mktime(entry.published_parsed) < (86400*4)]

would give you your list

Upvotes: 2

Related Questions