xralf
xralf

Reputation: 3742

Correctly sort RSS items by time

I'm getting RSS items from different RSS channels. And I'd like to sort them correctly by time and take into account the time zone, from the latests to the oldests. So far, I have the following code:

import feedparser
import dateutil.parser

rss_channels = [
    "https://www.novinky.cz/rss",
    "https://news.ycombinator.com/rss",
    "https://unix.stackexchange.com/feeds",
    "https://www.lupa.cz/rss/clanky/",
    "https://www.lupa.cz/rss/n/digizone/",
    "https://www.zive.cz/rss/sc-47/",
    "https://bitcoin.stackexchange.com/feeds",
    "https://vi.stackexchange.com/feeds",
    "https://askubuntu.com/feeds",
]

latest_items = []

for url in rss_channels:
    feed = feedparser.parse(url)
    for entry in feed.entries:
        pub_date_str = entry.published

        try:
            pub_date = dateutil.parser.parse(pub_date_str, ignoretz=True, fuzzy=True)
            if pub_date.tzinfo is None:
                pub_date = pub_date.replace(tzinfo=dateutil.tz.tzutc())
            latest_items.append((entry.title, pub_date, entry.link))
        except Exception as e:
            print(str(e))

latest_items.sort(key=lambda x: x[1], reverse=True)

for title, pub_date, url in latest_items:
    print(f"{pub_date.strftime('%Y-%m-%d %H:%M:%S %z')} - {title} - {url}")

I'm not sure if the code is correct. Could you assure me or refute and show me what's wrong? The code is very slow as well, so if it's possible to make faster, it would be great.

Upvotes: -1

Views: 168

Answers (1)

xralf
xralf

Reputation: 3742

Finally, I used the following snippet.

try:
  pub_date = dateutil.parser.parse(entry.published).replace(tzinfo=None)
  pub_date = pytz.timezone('Europe/Prague').localize(pub_date)
  # ...
except Exception as e:
  print(str(e))

Upvotes: 0

Related Questions