Reputation: 670
So I am getting lost somewhere in converting unicode to utf-8. I am trying to define some JSON containing unicode characters, and writing them to file. When printing to the terminal the character is represented as '\u2606'. When having a look at the file the character is encoded to '\u2606', note the double backslash. Could someone point me into the right direction regarding these encoding issues?
# encoding=utf8
import json
data = {"summary" : u"This is a unicode character: ☆"}
print data
decoded_data = unicode(data)
print decoded_data
with open('decoded_data.json', 'w') as outfile:
json.dump(decoded_data, outfile)
I tried adding the following snippet to the head of my file, but this had no success neither.
import sys
import codecs
sys.stdout = codecs.getwriter('utf8')(sys.stdout)
sys.stderr = codecs.getwriter('utf8')(sys.stderr)
Upvotes: 1
Views: 1122
Reputation: 195
I think you can also refer to this link.It is also really useful
Upvotes: 0
Reputation: 42758
First you are printing the representation of a dictionary, and python only uses ascii characters and escapes any other character with \uxxxx
.
The same is with json.dump
trying to only use ascii characters. You can force json.dump
to use unicode with:
json_data = json.dumps(data, ensure_ascii=False)
with open('decoded_data.json', 'w') as outfile:
outfile.write(json_data.encode('utf8'))
Upvotes: 1