Reputation: 7532
Disclaimer: I struggle with encodings.
I'm using SQLAlchemy to pull some data from an Oracle 12 database with WE8ISO8859P1 character set (according to NLS_CHARACTERSET)
Somewhere there is a value in the database (assuming a person's name) that when the value is passed to Python, it throws an error.
UnicodeDecodeError: 'charmap' codec can't decode byte 0x81 in position 4: character maps to <undefined>
Code is:
Base = declarative_base()
class Person(Base):
__tablename__= 'PERSON'
id = Column(Integer,primary_key=True)
lastname = Column(String)
firstname = Column(String)
middlename = Column(String)
active = Column(Integer)
sex = Column(String)
dateofbirth = Column(String)
engine = create_engine('oracle://USER:[email protected]:1521/ORCL', echo=True)
Session = sessionmaker(bind=engine)
session = Session()
testList = []
for user in session.query(Person).all():
testList.append(user)
Checking in oracle v$session_connect_info I see that the client running this code is connecting with character set WE8MSWIN1252
I know Python uses Unicode, so it looks like I have 3 different encodings that aren't matching, and I really don't know where to start.
Should I...
create_engine
script? (I tried to pass encoding='WE8ISO8859P1'
and it didn't accept that.Bonus: What exactly is 0x81? Maybe that shouldn't even be encoded in a person's name?
Upvotes: 1
Views: 2308
Reputation: 7086
You have a few options:
conn = cx_Oracle.connect("user/pw@tns", encoding = "ISO-8859-1", nencoding = "UTF-8")
Or you can simply set the environment variables NLS_LANG and NLS_NCHAR
NLS_LANG=.WE8ISO8859P1
NLS_LNCHAR=AL32UTF8
Note that you can also use encoding = "UTF-8" in the cx_Oracle connect() method as ISO-8859-1 is easily converted to UTF-8. cx_Oracle will automatically take care of the conversion to string if you are using Python 3.
Upvotes: 3