siamii
siamii

Reputation: 24134

How to scrape all comments from a subreddit on Reddit?

I'm trying to scrape all comments from a subreddit. I've found a library called PRAW. It gives an example

import praw
r = praw.Reddit('Comment parser example by u/_Daimon_')
subreddit = r.get_subreddit("python")
comments = subreddit.get_comments()

However, this returns only the most recent 25 comments. How can I parse all comments in the subreddit? On the Reddit interface, there's a next button, so it should be possible to go back in history page by page.

Upvotes: 4

Views: 10942

Answers (1)

IronManMark20
IronManMark20

Reputation: 1338

From the docs:

See UnauthenticatedReddit.get_comments() for complete usage.

That function has *args and **kwargs, and the function notes:

The additional parameters are passed directly into get_content(). Note: the url parameter cannot be altered.

Therefore, I looked at that function (find it here). One of the arguments for get_content is limit.

limit – the number of content entries to fetch. If limit <= 0, fetch the default for your account (25 for unauthenticated users). If limit is None, then fetch as many entries as possible (reddit returns at most 100 per request, however, PRAW will automatically make additional requests as necessary).

(Emphasis added). So my test was:

 comments=subreddit.get_comments(limit=None)

And I got 30+ comments (probably the 100 limit, but I had to go through them manually, so I thought 30 was enough).

Upvotes: 3

Related Questions