Gill Bates
Gill Bates

Reputation: 15217

How to save unicode with SQLAlchemy?

I've encountered such error:

File "/vagrant/env/local/lib/python2.7/site-packages/sqlalchemy/engine/default.py", line 435, in do_execute
            cursor.execute(statement, parameters)
        exceptions.UnicodeEncodeError: 'ascii' codec can't encode character u'\u2013' in position 8410: ordinal not in range(128)

It happens when Im trying to save ORM object with assigned Python's unicode string. And as a result dict parameters has a unicode string as one of its values and it produces the error while coercing it to str type.

I've tried to set convert_unicode=True setting on engine and column, but without success.

So what is a good way to handle unicode in SQLAlchemy?

UPDATE

This is some details about my setup:

Table:

                                    Table "public.documents"
   Column   |           Type           |                       Modifiers                        
------------+--------------------------+--------------------------------------------------------
 id         | integer                  | not null default nextval('documents_id_seq'::regclass)
 sha256     | text                     | not null
 url        | text                     | 
 source     | text                     | not null
 downloaded | timestamp with time zone | not null
 tags       | json                     | not null
Indexes:
    "documents_pkey" PRIMARY KEY, btree (id)
    "documents_sha256_key" UNIQUE CONSTRAINT, btree (sha256)

ORM model:

class Document(Base):
    __tablename__ = 'documents'

    id = Column(INTEGER, primary_key=True)
    sha256 = Column(TEXT(convert_unicode=True), nullable=False, unique=True)
    url = Column(TEXT(convert_unicode=True))
    source = Column(TEXT(convert_unicode=True), nullable=False)
    downloaded = Column(DateTime(timezone=True), nullable=False)
    tags = Column(JSON, nullable=False)

SQLAlchemy settngs:

ENGINE = create_engine('postgresql://me:secret@localhost/my_db',
                       encoding='utf8', convert_unicode=True)
Session = sessionmaker(bind=ENGINE)

And the code that produces the error is just creaes a session, instantiates a Document object and saves it with the sourcefieldwithunicode` strign assigned to it.

UPDATE #2

Check this repo - it has automated Vagrant/Ansible setup, and it reproduces this bug.

Upvotes: 6

Views: 15921

Answers (3)

Gunnlaugur Briem
Gunnlaugur Briem

Reputation: 2804

Your problem is here:

$ sudo grep client_encoding /etc/postgresql/9.3/main/postgresql.conf
client_encoding            = sql_ascii

That causes psycopg2 to default to ASCII:

>>> import psycopg2
>>> psycopg2.connect('dbname=dev_db user=dev').encoding
'SQLASCII'

... which effectively shuts off psycopg2's ability to handle Unicode.

You can either fix this in postgresql.conf:

client_encoding = utf8

(and then sudo invoke-rc.d postgresql reload), or you can specify the encoding explicitly when you create the engine:

self._conn = create_engine(src, client_encoding='utf8')

I recommend the former, because the early nineties are long gone. : )

Upvotes: 10

Wichert Akkerman
Wichert Akkerman

Reputation: 5348

I can not reproduce your error. I can provide a few tips regarding unicode handling with SQLAlchemy that may or may not help:

  • Instead of using convert_unicode just use the sqlalchemy.types.Unicode() column type. That will always do the right thing.
  • You are assigning a str instance ('key') to the key column, even though you used convert_unicode=True. You will either want to assign a unicode value, or use a non-unicode code column type.
  • Always check if the encoding for your PostgreSQL database is set correctly to UTF-8.
  • Normally you do not need the encoding and convert_unicode parameters for create_engine.

Upvotes: 0

metatoaster
metatoaster

Reputation: 18948

I can't reproduce your issue (also you didn't include examples on how you are actually adding your items into the database, fault might be there). However, I encourage you to test your code in complete isolation with the rest of your system to see whether what you want to do actually works without the interference of your other code. I created this file solely to test out whether what you wanted to do works, and the main method inserted the relevant object as a row into the database.

# encoding: utf-8

from sqlalchemy import Column, Integer, String, Boolean, Float, Text
from sqlalchemy import Column, INTEGER, TEXT
from sqlalchemy import create_engine, MetaData
from sqlalchemy.ext.declarative import declarative_base
from sqlalchemy.orm import sessionmaker

Base = declarative_base()


class Demo(Base):

    __tablename__ = 'demo'

    id = Column(INTEGER, primary_key=True)
    key = Column(TEXT(convert_unicode=True))
    value = Column(TEXT(convert_unicode=True))


class Backend(object):

    def __init__(self, src=None):
        if not src:
            src = 'sqlite://'

        self._conn = create_engine(src)
        self._metadata = MetaData()
        self._metadata.reflect(bind=self._conn)
        Base.metadata.create_all(self._conn)
        self._sessions = sessionmaker(bind=self._conn)

    def session(self):
        return self._sessions()


def main():
    backend = Backend('postgresql://postgres@localhost/test')
    s = backend.session()
    obj = Demo()
    obj.key = 'test'
    obj.value = u'–test–'
    s.add(obj)
    s.commit()
    return backend

Running this inside the interpreter:

>>> b = main()
>>> s = b.session()
>>> s.query(Demo).get(1).value
u'\u2013test\u2013'

And within psql:

postgres=# \c test
You are now connected to database "test" as user "postgres".
test=# select * from demo;
 id | key  | value  
----+------+--------
  1 | test | –test–
(1 row)

Sorry that I wasn't able to really help you, but I hope this will point you (or someone else) into figuring out why your code is getting unicode decoding error. Versions of software I used are python-2.7.7, sqlalchemy-0.9.6, psycopg2-2.5.3, postgresql-9.3.4.

Upvotes: 3

Related Questions