Reputation: 3
This is probably something very simple and I know that there are tons of similar cases like mine here on SO, but I just can't figure out how to fix this. I'm still rather new to Python.
I have a JSON file (expr.json) with the following contents:
{
"vowel": "a|e|i|o|u|y|ä|ö",
"consonant": "b|c|d|f|g|h|j|k|l|m|n|p|r|s|š|t|v|z|ž"
}
I want tho read the file and parse it's contents using Python's JSON module. I want to compile the values of the keys using re.compile later. Here is my code (main.py):
#!/usr/bin/python
# vim: set fileencoding=utf-8 :
import json
myfile = open('expr.json')
data = myfile.read()
myfile.close()
json_data = json.loads(data)
print json_data # {u'consonant': u'b|c|d|f|g|h|j|k|l|m|n|p|r|s|\u0161|t|v|z|\u017e', u'vowel': u'a|e|i|o|u|y|\xe4|\xf6'}
But when I try to acceess 'vowel':
json_data['vowel']
I get the following error message:
Traceback (most recent call last):
File "/path to main.py", line 11, in
print json_data['vowel']
UnicodeEncodeError: 'ascii' codec can't encode character u'\xe4' in position 12: ordinal > not in range(128) [Finished in 0.1s with exit code 1]
UnicodeDecodeError: 'ascii' codec can't decode byte 0xc3 in position 25: ordinal not in range(128)
1) Trying to encode string before calling json.loads using data.encode('utf-8') => Still the same error message
2) Escaping error causing characters (ä, ö) using their escaped versions: (\u00E4, \u00F6) => No error, but when I try to compile them using re.compile they do not work as expected (does not match the escaped characters)
3) Escaping characters using double backslash \\ => Still the same error message
I am using Python version 2.7.2 on Mac OSX. My editor is Sublime Text 2 and I've read the values from my editor's built-in console. I come from the world of javascript where I don't have the same problem.
Thank you in advance and I'm terribly sorry if my question is duplicate!
Upvotes: 0
Views: 2608
Reputation: 414089
If you try
print repr(json_data['vowel'])
you'll see that the value is shown i.e., the problem is not json but printing Unicode. Try
print u"\xe4"
it should produce the same UnicodeEncodeError
. Configure your editor to allow printing Unicode from Python. You could try to set PYTHONIOENCODING=utf-8
environment variable for editor's builtin console (or the encoding that it uses).
Unrelated to your issue, you could simplify slightly loading of utf-8 encoded json file:
import json
with open("expr.json", "rb") as file:
json_data = json.load(file)
Upvotes: 1