ruipacheco
ruipacheco

Reputation: 16402

Flask, SQLAlchemy and Jinja2 - UnicodeDecodeError

I have a web application that uses Flask, SQLAlchemy and WTForms, along with the necessary Flask extensions to make it all work. MySQL is using utf8_bin for all tables and columns.

I inserted some Chinese characters and phpMyAdmin displays them correctly but whenever I try to open a page I get the following exception:

UnicodeDecodeError: 'ascii' codec can't decode byte 0xe6 in position 0: ordinal not in range(128)

I understand I should decode('utf8') the fields I want to display but shouldn't this be handled by SQLAlchemy for me?

The only way I managed to make this work was by iterating through the list of results and doing something similar to:

object.property = object.property.decode('utf8')

But obviously this shouldn't have to be done by hand. What am I missing?

Update: SQLAlchemy mapping

class Thread(db.Model):

    __tablename__ = 'Thread'

    id = db.Column(db.Integer, primary_key=True)
    title = db.Column(db.Unicode(255), nullable=False)
    body = db.Column(db.Text, nullable=True)
    date_created = db.Column(db.DateTime, nullable=False, default=datetime.now())
    created_by = db.Column(db.Integer, ForeignKey(User.id))
    user = relationship(User, backref='threads')
    display_hash = db.Column(db.Unicode(255), nullable=False, unique=True)
    display_name = db.Column(db.Unicode(255), nullable=True)
    nsfw = db.Column(db.Boolean, nullable=False, default=False)
    last_updated = db.Column(db.DateTime, nullable=False)

    def __init__(self, title=None, body=None, category_id=None, display_name=None):
        self.title = title
        self.body = body
        self.category_id = category_id
        self.display_name = display_name
        self.display_hash = custom_uuid()
        self.last_updated = self.date_created

    def __repr__(self):
        return u'<Thread %r>' % (self.title)

    def url_title(self):
        """ Generates an ASCII-only slug. """

        result = []
        for word in _punct_re.split(self.title.lower()):
            result.extend(unidecode(word).split())
        return unicode(u'-'.join(result))

Update: stack trace

`127.0.0.1 - - [06/Oct/2013 02:37:15] "GET /index HTTP/1.1" 500 -
Traceback (most recent call last):
  File "/Users/homedirectory/.virtualenvs/fruitshow/lib/python2.7/site-packages/flask/app.py", line 1836, in __call__
    return self.wsgi_app(environ, start_response)
  File "/Users/homedirectory/.virtualenvs/fruitshow/lib/python2.7/site-packages/flask/app.py", line 1820, in wsgi_app
    response = self.make_response(self.handle_exception(e))
  File "/Users/homedirectory/.virtualenvs/fruitshow/lib/python2.7/site-packages/flask/app.py", line 1403, in handle_exception
    reraise(exc_type, exc_value, tb)
  File "/Users/homedirectory/.virtualenvs/fruitshow/lib/python2.7/site-packages/flask/app.py", line 1817, in wsgi_app
    response = self.full_dispatch_request()
  File "/Users/homedirectory/.virtualenvs/fruitshow/lib/python2.7/site-packages/flask/app.py", line 1477, in full_dispatch_request
    rv = self.handle_user_exception(e)
  File "/Users/homedirectory/.virtualenvs/fruitshow/lib/python2.7/site-packages/flask/app.py", line 1381, in handle_user_exception
    reraise(exc_type, exc_value, tb)
  File "/Users/homedirectory/.virtualenvs/fruitshow/lib/python2.7/site-packages/flask/app.py", line 1475, in full_dispatch_request
    rv = self.dispatch_request()
  File "/Users/homedirectory/.virtualenvs/fruitshow/lib/python2.7/site-packages/flask/app.py", line 1461, in dispatch_request
    return self.view_functions[rule.endpoint](**req.view_args)
  File "/Users/homedirectory/Projects/Assorted/Fruit Show/app/views.py", line 90, in index
    return render_template('index.html', threads=threads, pagination=pagination)
  File "/Users/homedirectory/.virtualenvs/fruitshow/lib/python2.7/site-packages/flask/templating.py", line 128, in render_template
    context, ctx.app)
  File "/Users/homedirectory/.virtualenvs/fruitshow/lib/python2.7/site-packages/flask/templating.py", line 110, in _render
    rv = template.render(context)
  File "/Users/homedirectory/.virtualenvs/fruitshow/lib/python2.7/site-packages/jinja2/environment.py", line 969, in render
    return self.environment.handle_exception(exc_info, True)
  File "/Users/homedirectory/.virtualenvs/fruitshow/lib/python2.7/site-packages/jinja2/environment.py", line 742, in handle_exception
    reraise(exc_type, exc_value, tb)
  File "/Users/homedirectory/Projects/Assorted/Fruit Show/app/templates/index.html", line 1, in top-level template code
    {% extends 'base.html' %}
  File "/Users/homedirectory/Projects/Assorted/Fruit Show/app/templates/base.html", line 50, in top-level template code
    {% block content %}
  File "/Users/homedirectory/Projects/Assorted/Fruit Show/app/templates/index.html", line 14, in block "content"
    <a href="{{ url_for('new_thread') }}/{{ thread.display_hash|safe }}/{{ thread.url_title()|safe }}">{{ thread.title|safe }}</a>
  File "/Users/homedirectory/.virtualenvs/fruitshow/lib/python2.7/site-packages/jinja2/filters.py", line 747, in do_mark_safe
    return Markup(value)
  File "/Users/homedirectory/.virtualenvs/fruitshow/lib/python2.7/site-packages/markupsafe/__init__.py", line 72, in __new__
    return text_type.__new__(cls, base)
UnicodeDecodeError: 'ascii' codec can't decode byte 0xe6 in position 0: ordinal not in range(128)`

Update: URL for project repo:

https://github.com/ruipacheco/fruitshow

Upvotes: 6

Views: 2509

Answers (4)

Charles Merriam
Charles Merriam

Reputation: 20500

Not quite your answer, but let me recommend ftfy (Fix Text For You) which fixes a bunch of small unicode and html escaping issues. One truly annoying religious war in Unicode encoding is the inability of UTF-8 to deal with the various one byte character encodings such as Latin-1. Instead of just going "oh, that must be a simple Latin character", the decoder gets flustered. When your database driver makes the observation of "oh, this fits", it creates at fatwah.

Upvotes: 0

ruipacheco
ruipacheco

Reputation: 16402

The problem is with the MySQL driver I'm using.

I followed this answer and switching the column type from utf8_bin to utf8_general_ci did the trick.

Upvotes: 4

ajknzhol
ajknzhol

Reputation: 6450

A little suggestion for Slug field in your Models.

There is a Library called Webhelpers (https://pypi.python.org/pypi/WebHelpers), import that and your title will be automatically converted into the slug.

Install WebHelpers and then import urlify

from webhelpers.text import urlify
.
.
.
@property
def slug(self):
    return urlify(self.title)

Upvotes: 2

SingleNegationElimination
SingleNegationElimination

Reputation: 156128

setting the charset in the connection parameters only tells mysql to transcode columns from however they are in the database to the requested format encoding. The data is still passed between MySQL and the client as bytes. In short, you have to tell sqlalchemy that "this particular" data is unicode data (in the connection's encoding). For most of your columns, you have used Unicode, which serves this purpose. A notable standout is body, which is of type Text. You probably want UnicodeText or Text(convert_unicode=True)

Upvotes: 0

Related Questions