Charles Von Franklyn
Charles Von Franklyn

Reputation: 47

How to properly scrape json response from reddit?

I am attempting to save all "author" entries from the json linked below into a list however am very new to python. Can someone kindly point me in the right direction?

the json: https://codebeautify.org/jsonviewer/cb0d0a91

Trying to scrape a reddit thread:

import requests
import json

url ="https://www.reddit.com/r/easternshoremd/comments/72u501/going_to_be_in_the_easton_area_for_work_next_week.json"

r = requests.get(url, headers={'User-agent': 'Chrome'})
d = r.json()

scrapedids = []

for child in d['data']['children']:
    scrapedids.append(child['data']['author'])

print (scrapedids)

If I switch the url from a reddit post to the subreddit then it works. For example, if I set

url = ("https://www.reddit.com/r/easternshoremd.json")

I believe the issue is my lack of understanding in the directory/tree (whatever it's called) of json. I've been hung up for a few hours and appreciate any assistance.

The error:

Traceback (most recent call last): File "/home/usr/PycharmProjects/untitled/delete.py", line 14, in for child in d['data']['children']: TypeError: list indices must be integers or slices, not str

Upvotes: 1

Views: 1508

Answers (1)

de1
de1

Reputation: 3124

You included a link to the JSON, which is good. It shows that the root is an array.

Therefore your code should look more like:

import requests
import json

url ="https://www.reddit.com/r/easternshoremd/comments/72u501/going_to_be_in_the_easton_area_for_work_next_week.json"

r = requests.get(url, headers={'User-agent': 'Chrome'})
listings = r.json()

scrapedids = []

for listing in listings:
    for child in listing['data']['children']:
        scrapedids.append(child['data']['author'])

print (scrapedids)

Note that I renamed d to listings which relates to the kind attribute ('listing').

Upvotes: 3

Related Questions