Rohit
Rohit

Reputation: 97

Encoding issue for " Unicode Character “ü” " while reading it from Oracle db using CX_Oracle and Pandas

I am trying to read Oracle data table output in a dataframe which I need to compare against another dataframe.

Oracle has str value Unicode Character “ü” which is appearing as 'u' in dataframe.

Code I tried:

import pandas as pd
import cx_Oracle

conn = cx_Oracle.makedsn(host='hostname', port='1521', service_name= 'SomeName')
sqlconn = cx_Oracle.connect( user='Username', password='$$$$$', dsn=conn)
sqlquery = "Select statement"
df2 = pd.read_sql(sqlquery, sqlconn)

print(df2)
**UBERX**,2003-10-01 00:00:00,I,N/A,Not Available

Expected 
**ÜBERX**,2003-10-01 00:00:00,I,N/A,Not Available

If i export the output to csv

df2.to_csv('/home/user/05June_1_ORA.csv', index=False)

In Unix loc:

bash-4.2$ file -i *
05June_1_ORA.csv: text/plain; charset=us-ascii

This data is getting ingested to oracle using a csv and its encoding is utf-8

sourcefile_05June_1.csv:     text/plain; charset=utf-8

Please let me know how can I resolve it.

Upvotes: 1

Views: 633

Answers (1)

Anthony Tuininga
Anthony Tuininga

Reputation: 7086

When you connect to the database, ensure that you set the encoding. This will become default in cx_Oracle 8, but for now, do this:

sqlconn = cx_Oracle.connect(user='Username', password='$$$$$', dsn=conn,
        encoding="UTF-8", nencoding="UTF-8")

Upvotes: 3

Related Questions