midrare
midrare

Reputation: 2764

Setting query results encoding in cx_Oracle / UnicodeDecodeError with Chinese characters

I'm working with a database containing a lot of Chinese characters. My code goes something like this:

connection = cx_Oracle.connect("%s/%s@%s:%s/%s" % (username, password, host, port, service_name))
cursor = connection.cursor()
cursor.execute('SELECT HOTEL_ID,CREATE_TIME,SOURCE,CONTENT,TITLE,RATE,UPDATE_TIME FROM T_FX_COMMENTS')

for row in cursor:
    # Stuff goes here
    pass

But I get this error:

Traceback (most recent call last):
  File "test.py", line 17, in <module>
    for row in cursor:
UnicodeDecodeError: 'gbk' codec can't decode byte 0xaf in position 26: illegal multibyte sequence

It seems GBK is not enough. I want to make cx-oracle give me GB18030 encoded results, instead of GBK. How do I do this?

cx_Oracle.Connection.encoding is read-only... I haven't found anything in the cx-oracle documentation that suggests I can do this.

I'm on Python 3.3.2 and cx-oracle 5.1.2. There must be something I'm missing here. Help is appreciated!

Upvotes: 8

Views: 6288

Answers (3)

Hector
Hector

Reputation: 1

Use this:

import os
os.environ["NLS_LANG"] = ".zhs16gbk"

os.environ["NLS_LANG"] is for Oracle.So use the format of Oracle. I solved with this with my python 2.6.8 and Oracle 11g.

Upvotes: 0

daveoncode
daveoncode

Reputation: 19578

I was facing the same issue and I solved by setting the environment variable NLS_LANG to .AL32UTF8 (it seems a sort of "wildcard" that says "use utf-8 for any language")

Upvotes: 7

Maciek
Maciek

Reputation: 3234

Try setting the NLS_LANG environment variable at the beginning of your program:

import os
os.environ["NLS_LANG"] = ".GB18030"

Upvotes: 2

Related Questions