bertday
bertday

Reputation: 10971

Can't write Unicode text using cx_Oracle

I'm working on a Python script that reads in a CSV writes the contents out to an Oracle database using cx_Oracle. So far I've been getting the following error:

UnicodeEncodeError: 'ascii' codec can't encode character '\xa0' in position 1369: ordinal not in range(128)

Evidently, cx_Oracle is trying to convert a Unicode character to ASCII and it's not working.

A few clarifying points:

Given that I'm reading a UTF-8 file and trying to write to a Unicode-encoded table, can anyone think of why cx_Oracle would still be trying to convert my data to ASCII?

I'm able to produce the error with this code:

field_map = {
    ...
}

with open('all.csv', encoding='utf8') as f:
    reader = csv.DictReader(f)
    out_rows = []

    for row in reader:
        if i == 1000:
            break

        out_row = {}
        for field, source_field in field_map.items():
            out_val = row[source_field]            
            out_row[field] = out_val

        out_rows.append(out_row)

        i += 1

out_db = datum.connect('oracle-stgeom://user:pass@db')
out_table = out_db['service_requests']
out_table.write(out_rows, chunk_size=10000)

The datum module is a data abstraction library I'm working on. The function responsible for writing to Oracle table is found here.

The full traceback is:

  File "C:\Projects\311\write.py", line 64, in <module>
    out_table.write(out_rows, chunk_size=10000)
  File "z:\datum\datum\table.py", line 89, in write
    self._child.write(rows, from_srid=from_srid, chunk_size=chunk_size)
  File "z:\datum\datum\oracle_stgeom\table.py", line 476, in write
    self._c.executemany(None, val_rows)
UnicodeEncodeError: 'ascii' codec can't encode character '\xa0' in position 1361: ordinal not in range(128)

Upvotes: 2

Views: 2075

Answers (1)

Anthony Tuininga
Anthony Tuininga

Reputation: 7086

Check the value of the "encoding" and "nencoding" attributes on the connection. This value is set by calling OCI routines that check the environment variables NLS_LANG and NLS_NCHAR. It looks like this value is US-ASCII or equivalent. When writing to the database, cx_Oracle takes the text and gets a byte string by encoding in the encoding the Oracle client is expecting. Note that this is unrelated to the database encoding. In general, for best performance, it is a good idea to match the database and client encodings -- but if you don't, Oracle will quite happily convert between the two, provided all of the characters used can be represented in both character sets!

Note that if the value of NLS_LANG is invalid it is essentially ignored. AL16UTF16 is one such invalid entry! So set it to the value you would like (such as .AL32UTF8) and check the value of encoding and nencoding on the connection until you get what you want.

Note as well that unless you state otherwise, all strings bound via cx_Oracle to the database are assumed to be in the normal encoding, not the NCHAR encoding. You can override this by using cursor.setinputsizes() and specifying that the input type is NCHAR, FIXED_NCHAR or LONG_NCHAR.

Upvotes: 2

Related Questions