Reputation: 31
I’m new to API’s and working with JSON and would love some help here.
I know everything I’m trying to accomplish can be done using the PRAW library, but I’m trying to figure it out without PRAW.
I have a for loop that pulls post titles from a specific subreddit, inputs all the post titles into a pandas data frame, and after the limit is reached, changes the ‘after
parameter to the last post id so it repeats with the next batch.
Everything worked perfectly, but when I tried the same technique with a specific thread and gathering the comments, the ‘after’
parameter doesn’t work to grab the next batch.
I’m assuming ‘after’
works differently with threads than with a subreddits posts. I saw in the JSON ‘more’
with a list of ids. Do I need to use this somehow? When I looked at the JSON for the thread, the ‘after’
says ‘none’
even with the updated parameters.
Any idea on what I need to change here? It’s probably something simple.
Working code for getting the subreddit posts with limit 5:
params = {"t":"day","limit":5}
for i in range(2):
response = requests.get('https://oauth.reddit.com/r/stocks/new',
headers=headers, params = params)
response = response.json()
for post in response['data']['children']:
name = post['data']['name']
print('name',name)
params['after'] = name
print(params)
Giving the output:
name t3_lifixn
name t3_lifg68
name t3_lif6u2
name t3_lif5o2
name t3_lif3cm
{'t': 'day', 'limit': 5, 'after': 't3_lif3cm'}
name t3_lif26d
name t3_lievhr
name t3_liev9i
name t3_liepud
name t3_lie41e
{'t': 'day', 'limit': 5, 'after': 't3_lie41e'}
Code for the Reddit thread with limit 10
params = {"limit":10}
for i in range(2):
response = requests.get('https://oauth.reddit.com/r/wallstreetbets/comments/lgrc39/',
params = params,headers=headers)
response = response.json()
for post in response[1]['data']['children']:
name = post['data']['name']
print(name)
params['after'] = name
print(params)
Giving the output:
t1_gmt20i4
t1_gmzo4xw
t1_gmzjofk
t1_gmzjkcy
t1_gmtotfl
{'limit': 10, 'after': 't1_gmtotfl'}
t1_gmt20i4
t1_gmzo4xw
t1_gmzjofk
t1_gmzjkcy
t1_gmtotfl
{'limit': 10, 'after': 't1_gmtotfl'}
Even though the limit was set to 10, it only gave 5 id's before continuing the loop. Also, rather than updating the 'after'
parameter, it just restarted.
Upvotes: 2
Views: 2745
Reputation: 31
I ended up figuring out how to do it. Reading the documentation for Reddit's API, when in a thread and you want to pull more comments, you have to compile a list of the id
's from the more
sections in the JSON. It's a nested tree and looks like the following:
{'kind': 'more', 'data': {'count': 161, 'name': 't1_gmuram8', 'id': 'gmuram8', 'parent_id': 't1_gmt20i4', 'depth': 1, 'children': ['gmuram8', 'gmt6mf6', 'gmubxmr', 'gmt63gl', 'gmutw5j', 'gmtpitn', 'gmtoec3', 'gmtnel0', 'gmt4p79', 'gmupqhx', 'gmv70rm', 'gmtu2sj', 'gmt2vc7', 'gmtmjai', 'gmtje0b', 'gmtkzzj', 'gmt93n5', 'gmtvsqa', 'gmumhat', 'gmuj73q', 'gmtor7c', 'gmuqcwv', 'gmt3lxe', 'gmt4l78', 'gmum9cm', 'gmt857f', 'gmtjrz3', 'gmu0qcl', 'gmt9t9i', 'gmt8jc7', 'gmurron', 'gmt3ysv', 'gmt6neb', 'gmt4v3x', 'gmtoi6t']}}
When using the get request, you would use the following url and format
requests.get(https://oauth.reddit.com/api/morechildren/.json?api_type=json&link_id=t3_lgrc39&children=gmt20i4,gmuram8....etc)
Upvotes: 1