Reputation: 47
I am attempting to save all "author" entries from the json linked below into a list however am very new to python. Can someone kindly point me in the right direction?
the json: https://codebeautify.org/jsonviewer/cb0d0a91
Trying to scrape a reddit thread:
import requests
import json
url ="https://www.reddit.com/r/easternshoremd/comments/72u501/going_to_be_in_the_easton_area_for_work_next_week.json"
r = requests.get(url, headers={'User-agent': 'Chrome'})
d = r.json()
scrapedids = []
for child in d['data']['children']:
scrapedids.append(child['data']['author'])
print (scrapedids)
If I switch the url from a reddit post to the subreddit then it works. For example, if I set
url = ("https://www.reddit.com/r/easternshoremd.json")
I believe the issue is my lack of understanding in the directory/tree (whatever it's called) of json. I've been hung up for a few hours and appreciate any assistance.
The error:
Traceback (most recent call last): File "/home/usr/PycharmProjects/untitled/delete.py", line 14, in for child in d['data']['children']: TypeError: list indices must be integers or slices, not str
Upvotes: 1
Views: 1508
Reputation: 3124
You included a link to the JSON, which is good. It shows that the root is an array.
Therefore your code should look more like:
import requests
import json
url ="https://www.reddit.com/r/easternshoremd/comments/72u501/going_to_be_in_the_easton_area_for_work_next_week.json"
r = requests.get(url, headers={'User-agent': 'Chrome'})
listings = r.json()
scrapedids = []
for listing in listings:
for child in listing['data']['children']:
scrapedids.append(child['data']['author'])
print (scrapedids)
Note that I renamed d
to listings
which relates to the kind
attribute ('listing').
Upvotes: 3