Reputation: 138
I am looking to create a web page with python back on flask, everything is working greatly and i'd recommand flask greatly. But when it comes to Unicode et encoding its always hard between python the webpage etc..
So i have a form that i post at a specific flask route, i get my my values and i need to do some little wrapper to get my variables in the good order and all.
I got this dict:
task_formatted.append(str(item['entity']))
I transform it to a str then i append it to a list so i can easely pass it to my template
I'd expect the str to be render as UTF-8 on the webpage python page:
# -*- coding: utf-8 -*-
html page:
<meta charset="utf-8"/>
i then print them in my page using jinja:
{% for item in task %}
<tr>
<td>{{item[0].decode('utf-8')}}</td>
<td>{{item[1].decode('utf-8')}}</td>
<td>{{item[2]}}</td>
<td>{{item[3]}}</td>
<td>{{item[4]}}</td>
<td><button id="taskmodal1"></td>
</tr>
{% endfor %}
but my item[0].decode('utf-8') and my item[1].decode('utf-8')
are printing :
{'type': 'Asset', 'id': 1404, 'name': 'Test-Asset comm\xc3\xa9'}
instead of
{'type': 'Asset', 'id': 1404, 'name': 'Test-Asset commé'}
I have tried several ways with .encode('utf-8') on the python side with unicode(str) with render_template().encode('utf-8') And i am growing out of ideas.
To be fair i think their is something i didn't understand with Unicode so i'd like to get some explanations (not documentation link because i most likely already read them) or some solutions to get it working,
its very important for my program to be able to write properly the str has i use it after in js http calls.
Thanks
PS: I am using python2
Upvotes: 0
Views: 1300
Reputation: 55669
I got this dict:
task_formatted.append(str(item['entity']))
I transform it to a
str
, then append it to a list so I can easily pass it to my template
This code doesn't do what you think it does.
>>> entity = {'type': 'Asset', 'id': 1404, 'name': 'Test-Asset commé'}
>>> str(entity)
"{'type': 'Asset', 'id': 1404, 'name': 'Test-Asset comm\\xc3\\xa9'}"
When you call str
on a dictionary (or a list), you do not get the result of calling str
on each of the dictionary's key and values: you get the repr of each key and value. In this case this means that 'Test-Asset commé' has been transformed to 'Test-Asset comm\xc3\xa9' in a way that is difficult to reverse.
>>> str(entity).decode('utf-8') # <- this doesn't work.
u"{'type': 'Asset', 'id': 1404, 'name': 'Test-Asset comm\\xc3\\xa9'}"
If you want to render your dictionaries in the template using just {{ item }}
you could use use the json module to serialise them instead of str
. Note that you need to convert the json (which is of type str
) to a unicode
instance to avoid a UnicodeDecodeError
when rendering the template.
>>> import json
>>> template = jinja2.Template(u"""<td>{{item}}</td>""")
>>> j = json.dumps(d, ensure_ascii=False)
>>> uj = unicode(j, 'utf-8')
>>> print template.render(item=uj)
<td>{"type": "Asset", "id": 1404, "name": "Test-Asset commé"}</td>
Some general observations / takeaways:
str
(or unicode
) to serialise containers like dictionaries or lists; use tools like json or pickle.unicode
, not str
unicode
, never use str
.Upvotes: 1
Reputation: 138
I found a solution for my problem:
unicodedata.normalize('NFKD', unicode(str(item['entity']['type']) + str(item['entity']['name']),'utf-8'))
first i transforme my dict to a string with str() then i turn it in UTF-8 Unicode with unicode('str' , 'utf-8') end finaly after importing unicodedata i use unicodedata.normalize()
Hope it'll help poeple
Upvotes: 0
Reputation: 9523
You are doing thing wrong.
<td>{{item[0].decode('utf-8')}}</td>
Why do you add the decode
? This is wrong. I recommend you not to put any conversion function. UTF-8 will work fine (and I think it is the default). In any case, you are not decoding. You are encoding a string into UTF-8 ("encoding": you use the code UTF-8, "decoding": from a specific coding value to a semantic value: in fact in python you should not care about how the strings are internally coded [BTW an internal coding, a sort of UTF-8, latin1, UTF-16 or UTF-32, according the most efficient way to encode the entire string]).
Just remove the decode('utf-8')
. On python code, you should not care about coding and decoding, but on input and output: use the sandwich rule. This will hugely simplify handling of strings, logic, and it avoid most of bugs
Upvotes: 0