Reputation: 861
I'm trying to read data from an Oracle table that has one CLOB column. I'm able to create a dataframe with the data but I have to maintain the connection to the database.
df = pd.read_sql("""
select
COURSE_NUMB,
COURSE_DESCRIPTION
from COURSE_DATA
""",conn)
df.head(1)
COURSE_NUMB COURSE_DESCRIPTION
1 Lorem ipsum dolor sit amet, consectetur....
After I close the connection to the database, the CLOB data is no longer available.
conn.close()
df.head()
DatabaseError: DPI-1040: LOB was already closed
I've tried making copies of the dataframe or exporting it to a json file, but the CLOB data is still missing in the copy or export.
Upvotes: 3
Views: 11894
Reputation: 133
There is a more efficient way to handle clob data with python. Just add an ouptput handler to the connection object.
first create your clob handler function, that will convert any incoming CLOB to String.
def output_type_handler(cursor, name, defaultType, size, precision, scale):
if defaultType == cx_Oracle.CLOB:
return cursor.var(cx_Oracle.LONG_STRING, arraysize=cursor.arraysize)
Then update you connection object:
conn = cx_Oracle.connect(user='psw', password='user', dsn=dsn_tns)
conn.outputtypehandler = output_type_handler
Now everytime the cursor encounters a clob object, it will automatically convert it, and it's way more efficient than using the apply function on every record or reading the object. Unfortunately I haven't found any workable solution to add the outputhandler to the updated oracle ads library which is much faster than the normal pandas.read_sql function.
Upvotes: 1
Reputation: 2297
We have to invoke read() on the CLOB object to read its value into the DataFrame:
### Set up table with CLOB data
SQL> DESC course_data
Name Null? Type
------------------ ----- ------
COURSE_NUMB NUMBER
COURSE_DESCRIPTION CLOB
SQL> SELECT * FROM course_data;
COURSE_NUMB COURSE_DESCRIPTION
----------- -------------------------------------------
1 Lorem ipsum dolor sit amet, consectetur....
### Python code
df = pd.DataFrame()
cursor = conn.cursor()
cursor.execute(""" select COURSE_NUMB, COURSE_DESCRIPTION from COURSE_DATA """)
for row in cursor:
# row[0] = COURSE_NUMB
# row[1] = COURSE_DESCRIPTION - note the call to "read()" on the CLOB
df_tmp = pd.DataFrame([[row[0], row[1].read()]],
columns=["COURSE_NUMB", "COURSE_DESCRIPTION"])
df = df.append(df_tmp, ignore_index=True)
print ("***Before conn.close()")
print(df.head(1))
conn.close()
print ("\n***After conn.close()")
print(df.head())
### Output
$ python test.py
***Before conn.close()
COURSE_NUMB COURSE_DESCRIPTION
0 1 Lorem ipsum dolor sit amet, consectetur....
***After conn.close()
COURSE_NUMB COURSE_DESCRIPTION
0 1 Lorem ipsum dolor sit amet, consectetur....
Upvotes: 3