dangelsaurus
dangelsaurus

Reputation: 7532

SQLAlchemy codec can't decode byte 0x81 with Oracle WE8ISO8859P1 encoding

Disclaimer: I struggle with encodings.

I'm using SQLAlchemy to pull some data from an Oracle 12 database with WE8ISO8859P1 character set (according to NLS_CHARACTERSET)

Somewhere there is a value in the database (assuming a person's name) that when the value is passed to Python, it throws an error.

UnicodeDecodeError: 'charmap' codec can't decode byte 0x81 in position 4: character maps to <undefined>

Code is:

Base = declarative_base()

class Person(Base):
    __tablename__= 'PERSON'

    id = Column(Integer,primary_key=True)
    lastname = Column(String)
    firstname = Column(String)
    middlename = Column(String)
    active = Column(Integer)  
    sex = Column(String)
    dateofbirth = Column(String)

engine = create_engine('oracle://USER:[email protected]:1521/ORCL', echo=True)

Session = sessionmaker(bind=engine)

session = Session()

testList = []


for user in session.query(Person).all():
    testList.append(user)

Checking in oracle v$session_connect_info I see that the client running this code is connecting with character set WE8MSWIN1252

I know Python uses Unicode, so it looks like I have 3 different encodings that aren't matching, and I really don't know where to start.

Should I...

  1. Try to change the encoding on the oracle client (Windows machine, yet I see other Windows clients connected with UTF-8)
  2. Try to change the encoding in the create_engine script? (I tried to pass encoding='WE8ISO8859P1' and it didn't accept that.
  3. Try to catch the error and change the value to something else.

Bonus: What exactly is 0x81? Maybe that shouldn't even be encoded in a person's name?

Upvotes: 1

Views: 2308

Answers (1)

Anthony Tuininga
Anthony Tuininga

Reputation: 7086

You have a few options:

conn = cx_Oracle.connect("user/pw@tns", encoding = "ISO-8859-1", nencoding = "UTF-8")

Or you can simply set the environment variables NLS_LANG and NLS_NCHAR

NLS_LANG=.WE8ISO8859P1
NLS_LNCHAR=AL32UTF8

Note that you can also use encoding = "UTF-8" in the cx_Oracle connect() method as ISO-8859-1 is easily converted to UTF-8. cx_Oracle will automatically take care of the conversion to string if you are using Python 3.

Upvotes: 3

Related Questions