Using Python to get Substack posts without scraping

Question

I want to create a dataframe of Substack posts from all the newsletter I subscribe to. But using feedparser + Substack's RSS feeds only seem to go back ~20 posts—even if a particular newsletter has hundreds of old posts.

Is there a way to use RSS to get all the old posts too? Or another method to get the same data I can using the RSS feed that doesn't involve scraping/beautifulSoup?

import feedparser
import pandas as pd

rawrss = ['https://heathercoxrichardson.substack.com/feed', 'https://marcstein.substack.com/feed']

posts = []
for url in rawrss:
    feed = feedparser.parse(url)
    for post in feed.entries:
        posts.append((post.title, post.link, post.summary, post.summary_detail, post.content, post.published))
df = pd.DataFrame(posts, columns=['title', 'link', 'summary', 'summary_detail', 'content', 'published'])
print(df)

Using Python to get Substack posts without scraping

Answers (1)

Related Questions