Anna Jokela
Anna Jokela

Reputation: 3

Can't correctly encode JSON file in Python

This is probably something very simple and I know that there are tons of similar cases like mine here on SO, but I just can't figure out how to fix this. I'm still rather new to Python.

Problem

I have a JSON file (expr.json) with the following contents:

{
    "vowel": "a|e|i|o|u|y|ä|ö",
    "consonant": "b|c|d|f|g|h|j|k|l|m|n|p|r|s|š|t|v|z|ž"
}

I want tho read the file and parse it's contents using Python's JSON module. I want to compile the values of the keys using re.compile later. Here is my code (main.py):

#!/usr/bin/python
# vim: set fileencoding=utf-8 :

import json

myfile = open('expr.json')
data = myfile.read()
myfile.close()

json_data = json.loads(data)
print json_data    # {u'consonant': u'b|c|d|f|g|h|j|k|l|m|n|p|r|s|\u0161|t|v|z|\u017e', u'vowel': u'a|e|i|o|u|y|\xe4|\xf6'}

But when I try to acceess 'vowel':

json_data['vowel']

I get the following error message:

Traceback (most recent call last):

File "/path to main.py", line 11, in

print json_data['vowel']

UnicodeEncodeError: 'ascii' codec can't encode character u'\xe4' in position 12: ordinal > not in range(128) [Finished in 0.1s with exit code 1]

UnicodeDecodeError: 'ascii' codec can't decode byte 0xc3 in position 25: ordinal not in range(128)

What have I tried

1) Trying to encode string before calling json.loads using data.encode('utf-8') => Still the same error message

2) Escaping error causing characters (ä, ö) using their escaped versions: (\u00E4, \u00F6) => No error, but when I try to compile them using re.compile they do not work as expected (does not match the escaped characters)

3) Escaping characters using double backslash \\ => Still the same error message


I am using Python version 2.7.2 on Mac OSX. My editor is Sublime Text 2 and I've read the values from my editor's built-in console. I come from the world of javascript where I don't have the same problem.

Thank you in advance and I'm terribly sorry if my question is duplicate!

Edit 1: Added full error message given by the Sublime Text's console.

Upvotes: 0

Views: 2608

Answers (1)

jfs
jfs

Reputation: 414089

If you try

print repr(json_data['vowel'])

you'll see that the value is shown i.e., the problem is not json but printing Unicode. Try

print u"\xe4"

it should produce the same UnicodeEncodeError. Configure your editor to allow printing Unicode from Python. You could try to set PYTHONIOENCODING=utf-8 environment variable for editor's builtin console (or the encoding that it uses).

Unrelated to your issue, you could simplify slightly loading of utf-8 encoded json file:

import json

with open("expr.json", "rb") as file:
    json_data = json.load(file)

Upvotes: 1

Related Questions