Dread
Dread

Reputation: 861

python read CLOB data

I'm trying to read data from an Oracle table that has one CLOB column. I'm able to create a dataframe with the data but I have to maintain the connection to the database.

df = pd.read_sql("""
select 
COURSE_NUMB,
COURSE_DESCRIPTION
from COURSE_DATA
""",conn)

df.head(1)

COURSE_NUMB   COURSE_DESCRIPTION
1             Lorem ipsum dolor sit amet, consectetur....

After I close the connection to the database, the CLOB data is no longer available.

conn.close()
df.head()

DatabaseError: DPI-1040: LOB was already closed

I've tried making copies of the dataframe or exporting it to a json file, but the CLOB data is still missing in the copy or export.

Upvotes: 3

Views: 11894

Answers (2)

Mohamed Amin Chairi
Mohamed Amin Chairi

Reputation: 133

There is a more efficient way to handle clob data with python. Just add an ouptput handler to the connection object.

first create your clob handler function, that will convert any incoming CLOB to String.

def output_type_handler(cursor, name, defaultType, size, precision, scale):
    if defaultType == cx_Oracle.CLOB:
        return cursor.var(cx_Oracle.LONG_STRING, arraysize=cursor.arraysize)

Then update you connection object:

conn = cx_Oracle.connect(user='psw', password='user', dsn=dsn_tns)
conn.outputtypehandler = output_type_handler

Now everytime the cursor encounters a clob object, it will automatically convert it, and it's way more efficient than using the apply function on every record or reading the object. Unfortunately I haven't found any workable solution to add the outputhandler to the updated oracle ads library which is much faster than the normal pandas.read_sql function.

Upvotes: 1

cdub
cdub

Reputation: 2297

We have to invoke read() on the CLOB object to read its value into the DataFrame:

### Set up table with CLOB data
SQL> DESC course_data
Name               Null? Type   
------------------ ----- ------ 
COURSE_NUMB              NUMBER 
COURSE_DESCRIPTION       CLOB   

SQL> SELECT * FROM course_data;

COURSE_NUMB COURSE_DESCRIPTION                                                              
----------- -------------------------------------------
          1 Lorem ipsum dolor sit amet, consectetur....   

### Python code
df = pd.DataFrame()

cursor = conn.cursor()
cursor.execute(""" select COURSE_NUMB, COURSE_DESCRIPTION from COURSE_DATA """)

for row in cursor:
  # row[0] = COURSE_NUMB
  # row[1] = COURSE_DESCRIPTION - note the call to "read()" on the CLOB
  df_tmp = pd.DataFrame([[row[0], row[1].read()]],
                        columns=["COURSE_NUMB", "COURSE_DESCRIPTION"])
  df = df.append(df_tmp, ignore_index=True)

print ("***Before conn.close()")
print(df.head(1))
conn.close()
print ("\n***After conn.close()")
print(df.head())

### Output
$ python test.py
***Before conn.close()
   COURSE_NUMB                           COURSE_DESCRIPTION
0            1  Lorem ipsum dolor sit amet, consectetur....

***After conn.close()
   COURSE_NUMB                           COURSE_DESCRIPTION
0            1  Lorem ipsum dolor sit amet, consectetur....

Upvotes: 3

Related Questions