kgs_doc
kgs_doc

Reputation: 11

Importing Complex Pandas Dataframe into Oracle Table

I have a CSV of 74 fields of mix types including dates and 10 rows with data in some columns but not most. The CSV non-data cells are empty (no nan, NULL, or coded values for missing). I'm trying to push this into an established table in Oracle. By default, empties in a dataframe convert to 'nan' no matter the field format. The issue with this is that Oracle doesn't accept 'nan' values, it accepts NULL. You can not fill pandas empties with NULL, so that seems to be the problem. Using cx_oracle lib to create a connection to the db and have used it in other places with simple variable imputation, but this is a first for me for loading an entire df into Oracle.

I have tried using sqlalchemy to make the connection and its to_sql() function to convert the df to something Oracle will like, but running into db connection issues. Since I can connect with cx_oracle, that's what I'm pursuing here.

Can I not load a bunch of empty cells into an Oracle table? If I can't then how should I convert empty cells into NULL in a way that pandas will load NULL as NULL into Oracle?

When the below code is run, I get the "ORA-01722: invalid number" error. I understand why I'm getting this, that "NULL" is trying to load into a number field, so there's a mismatch. Question is, what is the proper way to do this, accounting for a diverse data frame?

import pandas as pd
surveyData = pd.read_csv(r"FM_Sample.csv", delimiter=',', index_col = False)
from datetime import datetime

surveyData.fillna("NULL", inplace = True)

def insertDFrecs():
    import cx_Oracle
    connstr = 'URL:port/dbname'
    conn = cx_Oracle.connect('user', 'pass', connstr)
    curs = conn.cursor()
    curs.execute(query)
    conn.commit()
    conn.close()

oracleFieldsList = ['FO_NUM', 'WR_ID' ... 'FILE_NO', 'UMW']
oracle_fields = ",".join(oracleFieldsList)
    
try:
    for i,row in surveyData.iterrows():
        FO_NUM = row['FO_NUM'] #NUMBER
        WR_ID = row['WR_ID'] #NUMBER
        ... = ...
        FILE_NO = row['FILE_NO']
        UMW = row['UMW']

        entryValuesList = [FO_NUM,  WR_ID, PDIV_ID ... str(FILE_NO), str(UMW)]
        entry_values = str(entryValuesList).strip("[]")
        
        sql = "INSERT INTO ORACLE_TABLE (" + oracle_fields + """) VALUES (""" + entry_values + ")"

except Exception as e:
    print(e)

try:
    insertDFrecs(sql)
except Exception as e:
    print(e)

I was expecting the data from the CSV to convert to a pandas df and then load into an Oracle table.

Upvotes: 0

Views: 122

Answers (1)

Christopher Jones
Christopher Jones

Reputation: 10681

I don't see any issue loading empty fields as NULL when using code like load_csv.py, e.g like:

# Set cursor sizes to match the table definition or known max data sizes
#   create table test (id number, name varchar2(25));
cursor.setinputsizes(None, 25)

# Adjust the batch size to meet your memory and performance requirements
batch_size = 100

with open(FILE_NAME, 'r') as csv_file:
    csv_reader = csv.reader(csv_file, delimiter=',')
    sql = "insert into test (id,name) values (:1, :2)"
    data = []
    for line in csv_reader:
        data.append((line[0], line[1]))
        if len(data) % batch_size == 0:
            cursor.executemany(sql, data)
            data = []
    if data:
        cursor.executemany(sql, data)
    connection.commit()

With the data file:

1,Abel
2,Ande
3,Atkinson
4,
5,Baer
,Baida
7,Banda
,
9,Bell

This loads fine. Checking in SQL*Plus:

SQL> set NULL XXXX
SQL> select * from test;

        ID NAME
---------- -------------------------
         1 Abel
         2 Ande
         3 Atkinson
         4 XXXX
         5 Baer
XXXX       Baida
         7 Banda
XXXX       XXXX
         9 Bell

Upvotes: 1

Related Questions