Reputation: 10971
I'm working on a Python script that reads in a CSV writes the contents out to an Oracle database using cx_Oracle. So far I've been getting the following error:
UnicodeEncodeError: 'ascii' codec can't encode character '\xa0' in position 1369: ordinal not in range(128)
Evidently, cx_Oracle is trying to convert a Unicode character to ASCII and it's not working.
A few clarifying points:
open('all.csv', encoding='utf8')
NVARCHAR2
fields for text in the database and the NLS_NCHAR_CHARACTERSET
is set to AL16UTF16
. The NLS_CHARACTERSET
is WE8MSWIN1252
but from what I understand that shouldn't be relevant since I'm using NVARCHAR2
.NLS_LANG
environment variable to things like .AL16UTF16
, _.AL16UTF16
and AMERICAN_AMERICA.WE8MSWIN1252
per this post, but I still get the same error.Given that I'm reading a UTF-8 file and trying to write to a Unicode-encoded table, can anyone think of why cx_Oracle would still be trying to convert my data to ASCII?
I'm able to produce the error with this code:
field_map = {
...
}
with open('all.csv', encoding='utf8') as f:
reader = csv.DictReader(f)
out_rows = []
for row in reader:
if i == 1000:
break
out_row = {}
for field, source_field in field_map.items():
out_val = row[source_field]
out_row[field] = out_val
out_rows.append(out_row)
i += 1
out_db = datum.connect('oracle-stgeom://user:pass@db')
out_table = out_db['service_requests']
out_table.write(out_rows, chunk_size=10000)
The datum
module is a data abstraction library I'm working on. The function responsible for writing to Oracle table is found here.
The full traceback is:
File "C:\Projects\311\write.py", line 64, in <module>
out_table.write(out_rows, chunk_size=10000)
File "z:\datum\datum\table.py", line 89, in write
self._child.write(rows, from_srid=from_srid, chunk_size=chunk_size)
File "z:\datum\datum\oracle_stgeom\table.py", line 476, in write
self._c.executemany(None, val_rows)
UnicodeEncodeError: 'ascii' codec can't encode character '\xa0' in position 1361: ordinal not in range(128)
Upvotes: 2
Views: 2075
Reputation: 7086
Check the value of the "encoding" and "nencoding" attributes on the connection. This value is set by calling OCI routines that check the environment variables NLS_LANG and NLS_NCHAR. It looks like this value is US-ASCII or equivalent. When writing to the database, cx_Oracle takes the text and gets a byte string by encoding in the encoding the Oracle client is expecting. Note that this is unrelated to the database encoding. In general, for best performance, it is a good idea to match the database and client encodings -- but if you don't, Oracle will quite happily convert between the two, provided all of the characters used can be represented in both character sets!
Note that if the value of NLS_LANG is invalid it is essentially ignored. AL16UTF16 is one such invalid entry! So set it to the value you would like (such as .AL32UTF8) and check the value of encoding and nencoding on the connection until you get what you want.
Note as well that unless you state otherwise, all strings bound via cx_Oracle to the database are assumed to be in the normal encoding, not the NCHAR encoding. You can override this by using cursor.setinputsizes() and specifying that the input type is NCHAR, FIXED_NCHAR or LONG_NCHAR.
Upvotes: 2