Reputation: 3350
I have some strings in my database with Unicode chars that I can't display properly on my website. However it works correctly in one situation which is interesting.
So it works when I do this:
@app.route('/')
def main():
return render_template('home.html', text = '\u00e9ps\u00e9g')
# displays: épség
But it does not work when I do this (query the database and pass the string from result):
@app.route('/')
def main():
text_string = getText()
return render_template('home.html', text = text_string )
# displays: \u00e9ps\u00e9g
However when I use exactly the same string that I get from the second version with the first solution it works perfectly.
I am interested to discover why the first solution works and the second does not. Both string should be the same, but when I get it from the server it stays the same when I display it. When I add it manually it's good again. However unfortunately I have hundreds of strings so I need to use the second one.
Upvotes: 2
Views: 3300
Reputation: 177554
What you have in one case is unicode-escape sequences that represent a single Unicode character. In the other case you have literal characters \,u,... that represent six characters. this can be illustrated using raw strings, which ignore Unicode escape sequences:
>>> text = '\u00e9ps\u00e9g'
>>> print(text)
épség
>>> text = r'\u00e9ps\u00e9g'
>>> print(text)
\u00e9ps\u00e9g
To convert a Unicode string with literal escape sequences, first you need a byte string, then decode with the unicode_escape
codec. To obtain a byte string from a Unicode string with literal escape codes for non-ASCII characters, encode it with the ascii
codec:
>>> text = r'\u00e9ps\u00e9g'
>>> print(text)
\u00e9ps\u00e9g
>>> print(text.encode('ascii').decode('unicode_escape'))
épség
From your comment you may have text from a JSON data file. If it is proper JSON, this should decode it:
>>> s = r'"\u00e9ps\u00e9g \ud83c\udf0f"'
>>> print(s)
"\u00e9ps\u00e9g \ud83c\udf0f"
>>> print(json.loads(s))
épség 🌏
Note that a JSON string is quoted. It would not decode without the double-quotes.
Upvotes: 4