Reputation: 15217
I've encountered such error:
File "/vagrant/env/local/lib/python2.7/site-packages/sqlalchemy/engine/default.py", line 435, in do_execute
cursor.execute(statement, parameters)
exceptions.UnicodeEncodeError: 'ascii' codec can't encode character u'\u2013' in position 8410: ordinal not in range(128)
It happens when Im trying to save ORM object with assigned Python's unicode
string. And as a result dict
parameters
has a unicode string as one of its values and it produces the error while coercing it to str
type.
I've tried to set convert_unicode=True
setting on engine and column, but without success.
So what is a good way to handle unicode in SQLAlchemy?
This is some details about my setup:
Table:
Table "public.documents"
Column | Type | Modifiers
------------+--------------------------+--------------------------------------------------------
id | integer | not null default nextval('documents_id_seq'::regclass)
sha256 | text | not null
url | text |
source | text | not null
downloaded | timestamp with time zone | not null
tags | json | not null
Indexes:
"documents_pkey" PRIMARY KEY, btree (id)
"documents_sha256_key" UNIQUE CONSTRAINT, btree (sha256)
ORM model:
class Document(Base):
__tablename__ = 'documents'
id = Column(INTEGER, primary_key=True)
sha256 = Column(TEXT(convert_unicode=True), nullable=False, unique=True)
url = Column(TEXT(convert_unicode=True))
source = Column(TEXT(convert_unicode=True), nullable=False)
downloaded = Column(DateTime(timezone=True), nullable=False)
tags = Column(JSON, nullable=False)
SQLAlchemy settngs:
ENGINE = create_engine('postgresql://me:secret@localhost/my_db',
encoding='utf8', convert_unicode=True)
Session = sessionmaker(bind=ENGINE)
And the code that produces the error is just creaes a session, instantiates a Document
object and saves it with the source
fieldwith
unicode` strign assigned to it.
Check this repo - it has automated Vagrant/Ansible setup, and it reproduces this bug.
Upvotes: 6
Views: 15921
Reputation: 2804
Your problem is here:
$ sudo grep client_encoding /etc/postgresql/9.3/main/postgresql.conf
client_encoding = sql_ascii
That causes psycopg2 to default to ASCII:
>>> import psycopg2
>>> psycopg2.connect('dbname=dev_db user=dev').encoding
'SQLASCII'
... which effectively shuts off psycopg2's ability to handle Unicode.
You can either fix this in postgresql.conf:
client_encoding = utf8
(and then sudo invoke-rc.d postgresql reload
), or you can specify the encoding explicitly when you create the engine:
self._conn = create_engine(src, client_encoding='utf8')
I recommend the former, because the early nineties are long gone. : )
Upvotes: 10
Reputation: 5348
I can not reproduce your error. I can provide a few tips regarding unicode handling with SQLAlchemy that may or may not help:
convert_unicode
just use the sqlalchemy.types.Unicode() column type. That will always do the right thing.'key'
) to the key
column, even though you used convert_unicode=True
. You will either want to assign a unicode value, or use a non-unicode code column type.encoding
and convert_unicode
parameters for create_engine.Upvotes: 0
Reputation: 18948
I can't reproduce your issue (also you didn't include examples on how you are actually adding your items into the database, fault might be there). However, I encourage you to test your code in complete isolation with the rest of your system to see whether what you want to do actually works without the interference of your other code. I created this file solely to test out whether what you wanted to do works, and the main method inserted the relevant object as a row into the database.
# encoding: utf-8
from sqlalchemy import Column, Integer, String, Boolean, Float, Text
from sqlalchemy import Column, INTEGER, TEXT
from sqlalchemy import create_engine, MetaData
from sqlalchemy.ext.declarative import declarative_base
from sqlalchemy.orm import sessionmaker
Base = declarative_base()
class Demo(Base):
__tablename__ = 'demo'
id = Column(INTEGER, primary_key=True)
key = Column(TEXT(convert_unicode=True))
value = Column(TEXT(convert_unicode=True))
class Backend(object):
def __init__(self, src=None):
if not src:
src = 'sqlite://'
self._conn = create_engine(src)
self._metadata = MetaData()
self._metadata.reflect(bind=self._conn)
Base.metadata.create_all(self._conn)
self._sessions = sessionmaker(bind=self._conn)
def session(self):
return self._sessions()
def main():
backend = Backend('postgresql://postgres@localhost/test')
s = backend.session()
obj = Demo()
obj.key = 'test'
obj.value = u'–test–'
s.add(obj)
s.commit()
return backend
Running this inside the interpreter:
>>> b = main()
>>> s = b.session()
>>> s.query(Demo).get(1).value
u'\u2013test\u2013'
And within psql:
postgres=# \c test
You are now connected to database "test" as user "postgres".
test=# select * from demo;
id | key | value
----+------+--------
1 | test | –test–
(1 row)
Sorry that I wasn't able to really help you, but I hope this will point you (or someone else) into figuring out why your code is getting unicode decoding error. Versions of software I used are python-2.7.7, sqlalchemy-0.9.6, psycopg2-2.5.3, postgresql-9.3.4.
Upvotes: 3