Reputation: 2446
I'm really close having a script that fetches JSON from the New York Times API, then converts it to CSV. However, occasionally I get this error:
UnicodeEncodeError: 'ascii' codec can't encode character u'\u201c' in position 21: ordinal not in range(128)
I think I could avoid this all together if I converted the output to UTF-8, but I am unsure how to do so. Here is my python script:
import urllib2
import json
import csv
outfile_path='/NYTComments.csv'
writer = csv.writer(open(outfile_path, 'w'))
url = urllib2.Request('http://api.nytimes.com/svc/community/v2/comments/recent?api-key=ea7aac6c5d0723d7f1e06c8035d27305:5:66594855')
parsed_json = json.load(urllib2.urlopen(url))
print parsed_json
for comment in parsed_json['results']['comments']:
row = []
row.append(str(comment['commentSequence']))
row.append(str(comment['commentBody']))
row.append(str(comment['commentTitle']))
row.append(str(comment['approveDate']))
writer.writerow(row)
Upvotes: 1
Views: 3129
Reputation: 13911
A few things...
I don't know anything about the NewYork Times API, but I would guess you probably shouldn't publish a code snippet with your "api-key". Just a guess on this point (I've never used this API before)
If you look, the API is tells you the encoding. You are getting the following back in the header:
Content-Type=application/json; charset=UTF-8
Googling "python and UnicodeEncodeError" will give you a lot of help. But here, it seems your problem is probably calling the "str" on the comments. In which case, it will use the 'ascii' codec. And if there is a char above 128, then boom. You get the error you are seeing. Here is a pretty good blog post on the topic. It might help you to read over it.
Edit: This solution works for me:
for comment in parsed_json['results']['comments']:
row = []
row.append(str(comment['commentSequence']))
row.append(comment['commentBody'].encode('UTF-8', 'replace'))
row.append(comment['commentTitle'].encode('UTF-8', 'replace'))
row.append(str(comment['approveDate']))
writer.writerow(row)
Upvotes: 1
Reputation: 17659
Replace the second and third call to str() with unicode().
for comment in parsed_json['results']['comments']:
row = []
row.append(str(comment['commentSequence']))
row.append(unicode(comment['commentBody']))
row.append(unicode(comment['commentTitle']))
row.append(str(comment['approveDate']))
writer.writerow(row)
Upvotes: 0