rihekopo
rihekopo

Reputation: 3350

Can't display Unicode chars with Flask

I have some strings in my database with Unicode chars that I can't display properly on my website. However it works correctly in one situation which is interesting.

So it works when I do this:

@app.route('/')
def main():
    return render_template('home.html', text = '\u00e9ps\u00e9g')
# displays: épség

But it does not work when I do this (query the database and pass the string from result):

@app.route('/')
def main():
    text_string = getText()
    return render_template('home.html', text = text_string )
# displays: \u00e9ps\u00e9g

However when I use exactly the same string that I get from the second version with the first solution it works perfectly.

I am interested to discover why the first solution works and the second does not. Both string should be the same, but when I get it from the server it stays the same when I display it. When I add it manually it's good again. However unfortunately I have hundreds of strings so I need to use the second one.

Upvotes: 2

Views: 3300

Answers (1)

Mark Tolonen
Mark Tolonen

Reputation: 177554

What you have in one case is unicode-escape sequences that represent a single Unicode character. In the other case you have literal characters \,u,... that represent six characters. this can be illustrated using raw strings, which ignore Unicode escape sequences:

>>> text = '\u00e9ps\u00e9g'
>>> print(text)
épség
>>> text = r'\u00e9ps\u00e9g'
>>> print(text)
\u00e9ps\u00e9g

To convert a Unicode string with literal escape sequences, first you need a byte string, then decode with the unicode_escape codec. To obtain a byte string from a Unicode string with literal escape codes for non-ASCII characters, encode it with the ascii codec:

>>> text = r'\u00e9ps\u00e9g'
>>> print(text)
\u00e9ps\u00e9g
>>> print(text.encode('ascii').decode('unicode_escape'))
épség

From your comment you may have text from a JSON data file. If it is proper JSON, this should decode it:

>>> s = r'"\u00e9ps\u00e9g \ud83c\udf0f"'
>>> print(s)
"\u00e9ps\u00e9g \ud83c\udf0f"
>>> print(json.loads(s))
épség 🌏

Note that a JSON string is quoted. It would not decode without the double-quotes.

Upvotes: 4

Related Questions