Reputation: 313
I am trying to extract some data from a JSON file which contains tweets and write it to a csv. The file contains all kinds of characters, I'm guessing this is why i get this error message:
UnicodeEncodeError: 'ascii' codec can't encode character u'\u2026'
I guess I have to convert the output to utf-8 before writing the csv file, but I have not been able to do that. I have found similar questions here on stackoverflow, but not I've not been able to adapt the solutions to my problem (I should add that I am not really familiar with python. I'm a social scientist, not a programmer)
import csv
import json
fieldnames = ['id', 'text']
with open('MY_SOURCE_FILE', 'r') as f, open('MY_OUTPUT', 'a') as out:
writer = csv.DictWriter(
out, fieldnames=fieldnames, delimiter=',', quoting=csv.QUOTE_ALL)
for line in f:
tweet = json.loads(line)
user = tweet['user']
output = {
'text': tweet['text'],
'id': tweet['id'],
}
writer.writerow(output)
Upvotes: 4
Views: 12570
Reputation: 180401
You just need to encode the text to utf-8:
for line in f:
tweet = json.loads(line)
user = tweet['user']
output = {
'text': tweet['text'].encode("utf-8"),
'id': tweet['id'],
}
writer.writerow(output)
The csv module does not support writing unicode in python2:
Note This version of the csv module doesn’t support Unicode input. Also, there are currently some issues regarding ASCII NUL characters. Accordingly, all input should be UTF-8 or printable ASCII to be safe; see the examples in section Examples.
Upvotes: 6