Reputation: 19969
Using python to pick it some pieces so definitely a noob ? here but didn't seeing a satisfactory answer.
I have a json utf-8 file with some pieces that have grave's, accute's etc.... I'm using codecs and have (for example):
str=codecs.open('../../publish_scripts/locations.json', 'r','utf-8')
locations=json.load(str)
for location in locations:
print location['name']
For print'ing, does anything special need to be done? It's giving me the following
ascii' codec can't encode character u'\xe9' in position 5
It looks like the correct utf-8 value for e-accute. I suspect I'm doing something wrong with print'ing. Would the iteration cause it to lose it's utf-8'ness?
PHP and Ruby versions handle the utf-8 piece fine; is there some looseness in those languages that python won't do?
thx
Upvotes: 1
Views: 287
Reputation: 25426
The standard io streams are broken for non-ascii, character io in python2 and some site.py
setups. Basically, you need to sys.setdefaultencoding('utf8')
(or whatever the system locale's encoding is) very early in your script. With the site.py
shipped in ubuntu, you need to imp.reload(sys)
to make sys.setdefaultencoding
available. Alternatively, you can wrap sys.stdout (and stdin and stderr) to be unicode-aware readers/writers, which you can get from codecs.getreader
/ getwriter
.
Upvotes: 0
Reputation: 22619
codec.open() will decode the contents of the file using the codec you supplied (utf-8). You then have a python unicode object (which behaves similarly to a string object).
Printing a unicode object will cause an implict (behind-the-scenes) encode using the default codec, which is usually ascii
. If ascii
cannot encode all of the characters present it will fail.
To print it, you should first encode it, thus:
for location in locations:
print location['name'].encode('utf8')
EDIT:
For your info, json.load()
actually takes a file-like object (which is what codecs.open()
returns). What you have at that point is neither a string nor a unicode object, but an iterable wrapper around the file.
By default json.load()
expects the file to be utf8 encoded so your code snippet can be simplified:
locations = json.load(open('../../publish_scripts/locations.json'))
for location in locations:
print location['name'].encode('utf8')
Upvotes: 3
Reputation: 5718
You're probably reading the file correctly. The error occurs when you're printing. Python tries to convert the unicode string to ascii, and fails on the character in position 5.
Try this instead:
print location['name'].encode('utf-8')
If your terminal is set to expect output in utf-8 format, this will print correctly.
Upvotes: 2
Reputation: 246
It's the same as in PHP. UTF8 strings are good to print.
Upvotes: 0