mKeey
mKeey

Reputation: 135

Python UTF8 encoding 2.7.5 / 3.8.5

What I try to understand is i'm runing Python 3.8.5 on Windows and Python 2.7.5 on my webserver.

i'm trying to translate from a JSON with a code like this

hash = ""
try:
    hash = str(translateTable[item["hash"]])
except:
hash = str(item["hash"])

the following code is loading the JSON file

with io.open('translate.json', encoding="utf-8") as fh:
    translateTable = json.load(fh)

JSON FILE {"vunk": "Vunk-Gerät"}

When I run the code on windows with 3.7.5 the result is like it should be

IN >>> python test.py
OUT>>> Vunk-Gerät

Here comes the tricky part, when I run on my webserver with Python 2.7.5 the result is this

IN >>> python test.py
OUT>>> vunk

The problem is, on the Webserver it can't translate "Ä,Ö,Ü,ß" and I don't get it why?

Upvotes: 0

Views: 476

Answers (2)

mKeey
mKeey

Reputation: 135

For anyone who is facing the same problem as me here is the solution for 2.7.5

from django.utils.six import smart_str, smart_unicode
hash = ""
try:
    hash = smart_str(translateTable[item["hash"]])
except Exception as ex:
    hash = smart_str(item["hash"])

also make sure django is installed

pip install django

Upvotes: 1

snakecharmerb
snakecharmerb

Reputation: 55699

The most likely problem is that the values loaded from the json object are unicode rather than str. In Python 2 unicode is the equivalent of str in Python 3, and Python 2's str is the equivalent of Python 3's bytes. So the problem may be:

transtable = {u"vunk": u"Vunk-Gerät"}

str(transtable['vunk'])
Traceback (most recent call last):
  File "<stdin>", line 1, in <module>
UnicodeEncodeError: 'ascii' codec can't encode character u'\xe4' in position 8: ordinal not in range(128)

This happens because Python 2's str tries to encode u"Vunk-Gerät" to ascii, but it cannot (because of the "ä").

The simplest solution might be to avoid calling str at all:

hash = ""
try:
    hash = translateTable[item["hash"]]
except Exception as ex:
    hash = item["hash"]

since the keys and values should be usable as they are.

A more robust approach would be to use the six library to handle string and bytes types in a way that works with both Python 2 and Python 3. The ideal solution, as others have pointed out, is to run Python 3 on your server. Python 3 is much easier to use when processing non-ASCII text.

Upvotes: 1

Related Questions