Wouter
Wouter

Reputation: 670

Python encoding unicode<>utf-8

So I am getting lost somewhere in converting unicode to utf-8. I am trying to define some JSON containing unicode characters, and writing them to file. When printing to the terminal the character is represented as '\u2606'. When having a look at the file the character is encoded to '\u2606', note the double backslash. Could someone point me into the right direction regarding these encoding issues?

# encoding=utf8

import json

data = {"summary" : u"This is a unicode character: ☆"}
print data

decoded_data = unicode(data)
print decoded_data

with open('decoded_data.json', 'w') as outfile:
    json.dump(decoded_data, outfile)

I tried adding the following snippet to the head of my file, but this had no success neither.

import sys
import codecs
sys.stdout = codecs.getwriter('utf8')(sys.stdout)
sys.stderr = codecs.getwriter('utf8')(sys.stderr)

Upvotes: 1

Views: 1122

Answers (2)

Hardik Sachdeva
Hardik Sachdeva

Reputation: 195

I think you can also refer to this link.It is also really useful

Set Default Encoding

Upvotes: 0

Daniel
Daniel

Reputation: 42758

First you are printing the representation of a dictionary, and python only uses ascii characters and escapes any other character with \uxxxx.

The same is with json.dump trying to only use ascii characters. You can force json.dump to use unicode with:

json_data = json.dumps(data, ensure_ascii=False)
with open('decoded_data.json', 'w') as outfile:
    outfile.write(json_data.encode('utf8'))

Upvotes: 1

Related Questions