Reputation: 7546

Pandas to Oracle via SQL Alchemy: UnicodeEncodeError: 'ascii' codec can't encode character

Using pandas 18.1...

I'm trying to iterate through a folder of CSVs to read each CSV and send it to an Oracle database table. There is a non-ascii character lurking in one of my many CSVs (more like reveling in my anguish). I keep getting this error:

UnicodeEncodeError: 'ascii' codec can't encode character '\xab' in position 8: 
ordinal not in range(128)

Here's the code:

import pandas as pd
import pandas.io.sql as psql
from sqlalchemy import create_engine
import cx_Oracle as cx

engine = create_engine('oracle+cx_oracle://schema:'+pwd+'@server:port/service_name'
,encoding='latin1')

name='table'
path=r'path_to_folder'
filelist = os.listdir(path) 

for file in filelist:
    df = pd.read_csv(pathc+'\\'+file,encoding='latin1',index_col=0)
    df=df.astype('unicode')
    df['date'] = pd.to_datetime(df['date'])
    df['date'] = pd.to_datetime(df['Contract_EffDt'],format='%YYYY-%mm-%dd')
    df.to_sql(name, engine, if_exists = 'append')

I've tried the following:

encoding=utf-8 (in the engine, if I do that in read_csv, it throws an error)
Adding ?encoding=utf8 after "service_name" in the engine
Using df=df.astype('unicode') (and not)

What I want to do: Replace the unreadable character with something else and, most importantly, proceed with sending data to Oracle.

Note:

The data file I'm using are from the cms.gov site. Here's a zip file with an example. I'm using the "contracts_info" file.

Thanks in advance!

Upvotes: 2

Answers (2)

Volokh

Reputation: 390

You need to set the NLS_LANG environment variable like this:

os.environ['NLS_LANG']= 'AMERICAN_AMERICA.AL32UTF8'

Then the error won't occur.

Upvotes: 5

Dance Party2

Reputation: 7546

I encoded string fields to utf-8 individually and this may have helped (a new error occurred, but I assume it is not related to this):

dfc['Organization Type'] = dfc['Organization Type'].str.encode('utf-8')

New error:

DatabaseError: (cx_Oracle.DatabaseError) ORA-00904: "Contract_ID": invalid identifier

This was because "Contract_ID" was not set as the index. Once I did that, all went well (except for being slower than molasses, which begins my next adventure).

Upvotes: 2

Pandas to Oracle via SQL Alchemy: UnicodeEncodeError: &#39;ascii&#39; codec can&#39;t encode character

Answers (2)

Related Questions

Pandas to Oracle via SQL Alchemy: UnicodeEncodeError: 'ascii' codec can't encode character