Hausra5
Hausra5

Reputation: 31

Retrieving all comments from a thread on Reddit

I’m new to API’s and working with JSON and would love some help here.

I know everything I’m trying to accomplish can be done using the PRAW library, but I’m trying to figure it out without PRAW.

I have a for loop that pulls post titles from a specific subreddit, inputs all the post titles into a pandas data frame, and after the limit is reached, changes the ‘after parameter to the last post id so it repeats with the next batch.

Everything worked perfectly, but when I tried the same technique with a specific thread and gathering the comments, the ‘after’ parameter doesn’t work to grab the next batch.

I’m assuming ‘after’ works differently with threads than with a subreddits posts. I saw in the JSON ‘more’ with a list of ids. Do I need to use this somehow? When I looked at the JSON for the thread, the ‘after’ says ‘none’ even with the updated parameters.

Any idea on what I need to change here? It’s probably something simple.

Working code for getting the subreddit posts with limit 5:

params = {"t":"day","limit":5}
for i in range(2):
    response = requests.get('https://oauth.reddit.com/r/stocks/new',
                            headers=headers, params = params)
    response = response.json()
    for post in response['data']['children']:
        name = post['data']['name']
        print('name',name)
    params['after'] = name
    print(params)

Giving the output:

name t3_lifixn
name t3_lifg68
name t3_lif6u2
name t3_lif5o2
name t3_lif3cm
{'t': 'day', 'limit': 5, 'after': 't3_lif3cm'}
name t3_lif26d
name t3_lievhr
name t3_liev9i
name t3_liepud
name t3_lie41e
{'t': 'day', 'limit': 5, 'after': 't3_lie41e'}

Code for the Reddit thread with limit 10

params = {"limit":10}
for i in range(2):
    response = requests.get('https://oauth.reddit.com/r/wallstreetbets/comments/lgrc39/',
                            params = params,headers=headers)
    response = response.json()
    for post in response[1]['data']['children']:
        name = post['data']['name']
        print(name)
    params['after'] = name
    print(params)

Giving the output:

t1_gmt20i4
t1_gmzo4xw
t1_gmzjofk
t1_gmzjkcy
t1_gmtotfl
{'limit': 10, 'after': 't1_gmtotfl'}
t1_gmt20i4
t1_gmzo4xw
t1_gmzjofk
t1_gmzjkcy
t1_gmtotfl
{'limit': 10, 'after': 't1_gmtotfl'}

Even though the limit was set to 10, it only gave 5 id's before continuing the loop. Also, rather than updating the 'after' parameter, it just restarted.

Upvotes: 2

Views: 2745

Answers (1)

Hausra5
Hausra5

Reputation: 31

I ended up figuring out how to do it. Reading the documentation for Reddit's API, when in a thread and you want to pull more comments, you have to compile a list of the id's from the more sections in the JSON. It's a nested tree and looks like the following:

{'kind': 'more', 'data': {'count': 161, 'name': 't1_gmuram8', 'id': 'gmuram8', 'parent_id': 't1_gmt20i4', 'depth': 1, 'children': ['gmuram8', 'gmt6mf6', 'gmubxmr', 'gmt63gl', 'gmutw5j', 'gmtpitn', 'gmtoec3', 'gmtnel0', 'gmt4p79', 'gmupqhx', 'gmv70rm', 'gmtu2sj', 'gmt2vc7', 'gmtmjai', 'gmtje0b', 'gmtkzzj', 'gmt93n5', 'gmtvsqa', 'gmumhat', 'gmuj73q', 'gmtor7c', 'gmuqcwv', 'gmt3lxe', 'gmt4l78', 'gmum9cm', 'gmt857f', 'gmtjrz3', 'gmu0qcl', 'gmt9t9i', 'gmt8jc7', 'gmurron', 'gmt3ysv', 'gmt6neb', 'gmt4v3x', 'gmtoi6t']}}

When using the get request, you would use the following url and format

requests.get(https://oauth.reddit.com/api/morechildren/.json?api_type=json&link_id=t3_lgrc39&children=gmt20i4,gmuram8....etc)

Upvotes: 1

Related Questions