Reputation: 11
I am working on a school project using Google App Engine and Python 2.7. I am trying to output a nested dictionary like so: {city:[{song1:artist1},{song2:artist2}], city2:[{song1:artist1},{song2:artist2}]}
. However, the city names and the songs are from around the world, with special foreign characters. When I print out the dictionary, I get this string:
{'uOsaka'[{'u\u3086\u3081\u3044\u3089\u3093\u304b\u306d': u'Takajin Yashiki}
, etc... (where Osaka is the city, the unicode is the song, and Takajin is the artist)
Does anyone know how to get the name of the cities/songs to appear correctly?
Upvotes: 1
Views: 311
Reputation: 36028
Like in How to print national characters in list representation? , you need to use a custom procedure to print your data that would print strings themselves instead of their repr
:
def nrepr(data):
city_items=[]
for city, jukebox in data.iteritems():
jukebox_items=[]
for song,artist in jukebox.iteritems():
jukebox_items.append(u'"%s":"%s"' % (song,artist) )
city_items.append(u'"%s":{%s}' % (city, u",".join(jukebox_items)))
return u'{%s}' % u",".join(city_items)
>>> data={u'Osaka':{u'\u3086\u3081\u3044\u3089\u3093\u304b\u306d':u'Takajin Yashiki'}}
>>> print nrepr(data)
{"Osaka":{"ゆめいらんかね":"Takajin Yashiki"}}
(use from __future__ import unicode_literals
at the start of the file to avoid putting u
before every literal)
You are not constrained to mimicking Python's default output format, you can print them any way you like.
Alternatively, you can use a unicode
subclass for your strings that would have repr
with national characters:
class nu(unicode):
def __repr__(self):
return self.encode('utf-8') #must return str
>>> data={nu(u'Osaka'):{nu(u'\u3086\u3081\u3044\u3089\u3093\u304b\u306d'):nu(u'Takajin Yashiki')}}
>>> data
{Osaka: {ゆめいらんかね: Takajin Yashiki}}
This is problematic 'cuz repr
output is presumed to only contain ASCII characters and various code relies on this. You are extremily likely to get UnicodeError
s in random places. It will also print mojibake if a specific output channel's encoding is different from utf-8
or if further transcoding is involved.
Upvotes: 0
Reputation: 23186
The underlying issue in python 2.7 is that printing a dictionary involves converting it to a string, and that string will be a str
rather than a unicode
. Hence your output.
However when your render the individual items you will find they are fine:
>>> d = {u'Osaka': [{u'\u3086\u3081\u3044\u3089\u3093\u304b\u306d': u'Takajin Yashiki'}]}
>>> for k, v in d.viewitems():
... for pair in v:
... for song, artist in pair.viewitems():
... print k, song, artist
...
Osaka ゆめいらんかね Takajin Yashiki
Note that this is a Python 2 behavior. In Python 3, where str
is text, this data will be printed as UTF-8 and should render naturally in the console assuming you have the necessary fonts installed for Japanese glyphs:
(3.7) >>> print(d)
{'Osaka': [{'ゆめいらんかね': 'Takajin Yashiki'}]}
Upvotes: 1