KMM
KMM

Reputation: 53

Writing a dataframe from Python3 into Netezza

I want to write a dataframe straight from Python into Netezza. Is this even possible? What I am doing right now is exporting to CSV then uploading into Netezza which is quite time consuming...

A dummy dataset is as follows:

s1=pd.Series(["Test1"])
s2=pd.Series(["Test2"])
s3=pd.Series(["Test3"])
df=pd.DataFrame([list(s1), list(s2),  list(s3)],  columns =  ["STATUS"])

I then connect to Netezza (example):

#import libraries
import pandas as pd
import pyodbc

engine = pyodbc.connect("DRIVER={NetezzaSQL};SERVER=netezzakm01;PORT=5480;DATABASE=SDSNZKM01;UID=KMMTBA;PWD=xxx;")

This is as far as I get because anything else I have tried to write with has come nowhere near working...... is it possible to do this?

Upvotes: 2

Views: 1680

Answers (1)

tipanverella
tipanverella

Reputation: 3677

So far, the best I have been able to do is the following:

  1. On the netezza side I create a table with the structure I want,

    create table mytablename ( MYDATE DATE, MYDISTRIBUTEKEY BIGINT, SOMESTRING VARCHAR(128) DEFAULT NULL, SOMEINT INTEGER DEFAULT NULL, SOMEFLOAT FLOAT DEFAULT NULL ) distribute on (mydistributekey);

  2. my dataframe say mydf is such that

    list(mydf.columns) == ['MYDATE','MYDISTRIBUTEKEY','SOMESTRING','SOMEINT','SOMEFLOAT']

  3. dump the dataframe to disk

    mydf.to_csv('df_on_disk.tab',sep='\t',index=False,header=False)

  4. you can now to a BULK load of the file on disk into netezza, from netezza. Note that I connect via a JDBC, with jaydebeapi.

    INSERT INTO mytablename SELECT * FROM EXTERNAL 'df_on_disk.tab' USING (DELIM '\t' REMOTESOURCE 'JDBC'); GENERATE STATISTICS ON mytablename;

This is not the exact solution, but you would want to do something like that.

Upvotes: 1

Related Questions