Aurele Collinet
Aurele Collinet

Reputation: 138

Unicode bug flask jinja2

I am looking to create a web page with python back on flask, everything is working greatly and i'd recommand flask greatly. But when it comes to Unicode et encoding its always hard between python the webpage etc..

So i have a form that i post at a specific flask route, i get my my values and i need to do some little wrapper to get my variables in the good order and all.

I got this dict:

            task_formatted.append(str(item['entity']))

I transform it to a str then i append it to a list so i can easely pass it to my template

I'd expect the str to be render as UTF-8 on the webpage python page:

  # -*- coding: utf-8 -*- 

html page:

  <meta charset="utf-8"/>

i then print them in my page using jinja:

            {% for item in task %}
            <tr>
              <td>{{item[0].decode('utf-8')}}</td>
              <td>{{item[1].decode('utf-8')}}</td>
              <td>{{item[2]}}</td>
              <td>{{item[3]}}</td>
              <td>{{item[4]}}</td>
              <td><button id="taskmodal1"></td>
            </tr>
            {% endfor %}

but my item[0].decode('utf-8') and my item[1].decode('utf-8')

are printing :

{'type': 'Asset', 'id': 1404, 'name': 'Test-Asset comm\xc3\xa9'}

instead of

{'type': 'Asset', 'id': 1404, 'name': 'Test-Asset commé'}

I have tried several ways with .encode('utf-8') on the python side with unicode(str) with render_template().encode('utf-8') And i am growing out of ideas.

To be fair i think their is something i didn't understand with Unicode so i'd like to get some explanations (not documentation link because i most likely already read them) or some solutions to get it working,

its very important for my program to be able to write properly the str has i use it after in js http calls.

Thanks

PS: I am using python2

Upvotes: 0

Views: 1300

Answers (3)

snakecharmerb
snakecharmerb

Reputation: 55669

I got this dict:

task_formatted.append(str(item['entity']))

I transform it to a str, then append it to a list so I can easily pass it to my template

This code doesn't do what you think it does.

>>> entity = {'type': 'Asset', 'id': 1404, 'name': 'Test-Asset commé'}
>>> str(entity)
"{'type': 'Asset', 'id': 1404, 'name': 'Test-Asset comm\\xc3\\xa9'}"

When you call str on a dictionary (or a list), you do not get the result of calling str on each of the dictionary's key and values: you get the repr of each key and value. In this case this means that 'Test-Asset commé' has been transformed to 'Test-Asset comm\xc3\xa9' in a way that is difficult to reverse.

>>> str(entity).decode('utf-8')  # <- this doesn't work.
u"{'type': 'Asset', 'id': 1404, 'name': 'Test-Asset comm\\xc3\\xa9'}"

If you want to render your dictionaries in the template using just {{ item }} you could use use the json module to serialise them instead of str. Note that you need to convert the json (which is of type str) to a unicode instance to avoid a UnicodeDecodeError when rendering the template.

>>> import json
>>> template = jinja2.Template(u"""<td>{{item}}</td>""")
>>> j = json.dumps(d, ensure_ascii=False)
>>> uj = unicode(j, 'utf-8')
>>> print template.render(item=uj)
<td>{"type": "Asset", "id": 1404, "name": "Test-Asset commé"}</td>

Some general observations / takeaways:

  • Don't use str (or unicode) to serialise containers like dictionaries or lists; use tools like json or pickle.
  • Ensure any string literals that you pass to jinja2 are instances of unicode, not str
  • When using Python2, if there is any possibility that your code will process non-ascii values, always use unicode, never use str.

Upvotes: 1

Aurele Collinet
Aurele Collinet

Reputation: 138

I found a solution for my problem:

unicodedata.normalize('NFKD', unicode(str(item['entity']['type']) + str(item['entity']['name']),'utf-8'))

first i transforme my dict to a string with str() then i turn it in UTF-8 Unicode with unicode('str' , 'utf-8') end finaly after importing unicodedata i use unicodedata.normalize()

Hope it'll help poeple

Upvotes: 0

Giacomo Catenazzi
Giacomo Catenazzi

Reputation: 9523

You are doing thing wrong.

<td>{{item[0].decode('utf-8')}}</td>

Why do you add the decode? This is wrong. I recommend you not to put any conversion function. UTF-8 will work fine (and I think it is the default). In any case, you are not decoding. You are encoding a string into UTF-8 ("encoding": you use the code UTF-8, "decoding": from a specific coding value to a semantic value: in fact in python you should not care about how the strings are internally coded [BTW an internal coding, a sort of UTF-8, latin1, UTF-16 or UTF-32, according the most efficient way to encode the entire string]).

Just remove the decode('utf-8'). On python code, you should not care about coding and decoding, but on input and output: use the sandwich rule. This will hugely simplify handling of strings, logic, and it avoid most of bugs

Upvotes: 0

Related Questions