Reputation: 7536
Using pandas 18.1...
I'm trying to iterate through a folder of CSVs to read each CSV and send it to an Oracle database table. There is a non-ascii character lurking in one of my many CSVs (more like reveling in my anguish). I keep getting this error:
UnicodeEncodeError: 'ascii' codec can't encode character '\xab' in position 8:
ordinal not in range(128)
Here's the code:
import pandas as pd
import pandas.io.sql as psql
from sqlalchemy import create_engine
import cx_Oracle as cx
engine = create_engine('oracle+cx_oracle://schema:'+pwd+'@server:port/service_name'
,encoding='latin1')
name='table'
path=r'path_to_folder'
filelist = os.listdir(path)
for file in filelist:
df = pd.read_csv(pathc+'\\'+file,encoding='latin1',index_col=0)
df=df.astype('unicode')
df['date'] = pd.to_datetime(df['date'])
df['date'] = pd.to_datetime(df['Contract_EffDt'],format='%YYYY-%mm-%dd')
df.to_sql(name, engine, if_exists = 'append')
I've tried the following:
What I want to do: Replace the unreadable character with something else and, most importantly, proceed with sending data to Oracle.
Note:
The data file I'm using are from the cms.gov site. Here's a zip file with an example. I'm using the "contracts_info" file.
Thanks in advance!
Upvotes: 2
Views: 2666
Reputation: 390
You need to set the NLS_LANG environment variable like this:
os.environ['NLS_LANG']= 'AMERICAN_AMERICA.AL32UTF8'
Then the error won't occur.
Upvotes: 5
Reputation: 7536
I encoded string fields to utf-8 individually and this may have helped (a new error occurred, but I assume it is not related to this):
dfc['Organization Type'] = dfc['Organization Type'].str.encode('utf-8')
New error:
DatabaseError: (cx_Oracle.DatabaseError) ORA-00904: "Contract_ID": invalid identifier
This was because "Contract_ID" was not set as the index. Once I did that, all went well (except for being slower than molasses, which begins my next adventure).
Upvotes: 2