Reputation: 1547
I currently have a Python 2.7 script which scrapes Facebook and captures some JSON data from each page. The JSON data contains personal information. A sample of the JSON data is below:-
{
"id": "4",
"name": "Mark Zuckerberg",
"first_name": "Mark",
"last_name": "Zuckerberg",
"link": "http://www.facebook.com/zuck",
"username": "zuck",
"gender": "male",
"locale": "en_US"
}
The JSON values can vary from page to page. The above example lists all the possibles but sometimes, a value such as 'username' may not exist and I may encounter JSON data such as:-
{
"id": "6",
"name": "Billy Smith",
"first_name": "Billy",
"last_name": "Smith",
"gender": "male",
"locale": "en_US"
}
With this data, I want to populate a database table. As such, my code is as below:-
results_json = simplejson.loads(scraperwiki.scrape(profile_url))
for result in results_json:
profile = dict()
try:
profile['id'] = int(results_json['id'])
except:
profile['id'] = ""
try:
profile['name'] = results_json['name']
except:
profile['name'] = ""
try:
profile['first_name'] = results_json['first_name']
except:
profile['first_name'] = ""
try:
profile['last_name'] = results_json['last_name']
except:
profile['last_name'] = ""
try:
profile['link'] = results_json['link']
except:
profile['link'] = ""
try:
profile['username'] = results_json['username']
except:
profile['username'] = ""
try:
profile['gender'] = results_json['gender']
except:
profile['gender'] = ""
try:
profile['locale'] = results_json['locale']
except:
profile['locale'] = ""
The reason I have so many try/excepts is to account for when the key value doesn't exist on the webpage. Nonetheless, this seems to be a really clumpsy and messy way to handle this issue.
If I remove these try / exception clauses, should my scraper encounter a missing key, it returns a KeyError
such as "KeyError: 'username'"
and my script stops running.
Any suggestions on a much smarter and improved way to handle these errors so that, should a missing key be encountered, the script continues.
I've tried creating a list of the JSON values and looked to iterate through them with an IF clause but I just can't figure it out.
Upvotes: 4
Views: 4805
Reputation: 1121854
Use the .get()
method instead:
>>> a = {'bar': 'eggs'}
>>> a['foo']
Traceback (most recent call last):
File "<stdin>", line 1, in <module>
KeyError: 'foo'
>>> a.get('foo', 'default value')
'default value'
>>> a.get('bar', 'default value')
'eggs'
The .get()
method returns the value for the requested key, or the default value if the key is missing.
Or you can create a new dict with empty strings for each key and use .update()
on it:
profile = dict.fromkeys('id name first_name last_name link username gender locale'.split(), '')
profile.update(result)
dict.fromkeys()
creates a dictionary with all keys you request set to a given default value (''
in the above example), then we use .update()
to copy all keys and values from the result
dictionary, replacing anything already there.
Upvotes: 10