Katherine Waller
Katherine Waller

Reputation: 11

Want Python 27 to print out foreign characters

I am working on a school project using Google App Engine and Python 2.7. I am trying to output a nested dictionary like so: {city:[{song1:artist1},{song2:artist2}], city2:[{song1:artist1},{song2:artist2}]}. However, the city names and the songs are from around the world, with special foreign characters. When I print out the dictionary, I get this string:

{'uOsaka'[{'u\u3086\u3081\u3044\u3089\u3093\u304b\u306d': u'Takajin Yashiki}, etc... (where Osaka is the city, the unicode is the song, and Takajin is the artist)

Does anyone know how to get the name of the cities/songs to appear correctly?

Upvotes: 1

Views: 311

Answers (2)

ivan_pozdeev
ivan_pozdeev

Reputation: 36028

Like in How to print national characters in list representation? , you need to use a custom procedure to print your data that would print strings themselves instead of their repr:

def nrepr(data):
    city_items=[]
    for city, jukebox in data.iteritems():
       jukebox_items=[]
       for song,artist in jukebox.iteritems():
           jukebox_items.append(u'"%s":"%s"' % (song,artist) )
       city_items.append(u'"%s":{%s}' % (city, u",".join(jukebox_items)))
    return u'{%s}' % u",".join(city_items)

>>>  data={u'Osaka':{u'\u3086\u3081\u3044\u3089\u3093\u304b\u306d':u'Takajin Yashiki'}}

>>> print nrepr(data)
{"Osaka":{"ゆめいらんかね":"Takajin Yashiki"}}

(use from __future__ import unicode_literals at the start of the file to avoid putting u before every literal)

You are not constrained to mimicking Python's default output format, you can print them any way you like.


Alternatively, you can use a unicode subclass for your strings that would have repr with national characters:

class nu(unicode):
    def __repr__(self):
        return self.encode('utf-8')    #must return str

>>> data={nu(u'Osaka'):{nu(u'\u3086\u3081\u3044\u3089\u3093\u304b\u306d'):nu(u'Takajin Yashiki')}}
>>> data
{Osaka: {ゆめいらんかね: Takajin Yashiki}}

This is problematic 'cuz repr output is presumed to only contain ASCII characters and various code relies on this. You are extremily likely to get UnicodeErrors in random places. It will also print mojibake if a specific output channel's encoding is different from utf-8 or if further transcoding is involved.

Upvotes: 0

donkopotamus
donkopotamus

Reputation: 23186

The underlying issue in python 2.7 is that printing a dictionary involves converting it to a string, and that string will be a str rather than a unicode. Hence your output.

However when your render the individual items you will find they are fine:

>>> d = {u'Osaka': [{u'\u3086\u3081\u3044\u3089\u3093\u304b\u306d': u'Takajin Yashiki'}]} 
>>> for k, v in d.viewitems():
...   for pair in v:
...     for song, artist in pair.viewitems():
...         print k, song, artist
... 
Osaka ゆめいらんかね Takajin Yashiki

Note that this is a Python 2 behavior. In Python 3, where str is text, this data will be printed as UTF-8 and should render naturally in the console assuming you have the necessary fonts installed for Japanese glyphs:

(3.7) >>> print(d)
{'Osaka': [{'ゆめいらんかね': 'Takajin Yashiki'}]}

Upvotes: 1

Related Questions