Reputation: 53
I want to write a dataframe straight from Python into Netezza. Is this even possible? What I am doing right now is exporting to CSV then uploading into Netezza which is quite time consuming...
A dummy dataset is as follows:
s1=pd.Series(["Test1"])
s2=pd.Series(["Test2"])
s3=pd.Series(["Test3"])
df=pd.DataFrame([list(s1), list(s2), list(s3)], columns = ["STATUS"])
I then connect to Netezza (example):
#import libraries
import pandas as pd
import pyodbc
engine = pyodbc.connect("DRIVER={NetezzaSQL};SERVER=netezzakm01;PORT=5480;DATABASE=SDSNZKM01;UID=KMMTBA;PWD=xxx;")
This is as far as I get because anything else I have tried to write with has come nowhere near working...... is it possible to do this?
Upvotes: 2
Views: 1680
Reputation: 3677
So far, the best I have been able to do is the following:
On the netezza
side I create a table with the structure I want,
create table mytablename ( MYDATE DATE, MYDISTRIBUTEKEY BIGINT, SOMESTRING VARCHAR(128) DEFAULT NULL, SOMEINT INTEGER DEFAULT NULL, SOMEFLOAT FLOAT DEFAULT NULL ) distribute on (mydistributekey);
my dataframe say mydf
is such that
list(mydf.columns) == ['MYDATE','MYDISTRIBUTEKEY','SOMESTRING','SOMEINT','SOMEFLOAT']
dump the dataframe to disk
mydf.to_csv('df_on_disk.tab',sep='\t',index=False,header=False)
you can now to a BULK load of the file on disk into netezza, from netezza
. Note that I connect via a JDBC, with jaydebeapi
.
INSERT INTO mytablename SELECT * FROM EXTERNAL 'df_on_disk.tab' USING (DELIM '\t' REMOTESOURCE 'JDBC'); GENERATE STATISTICS ON mytablename;
This is not the exact solution, but you would want to do something like that.
Upvotes: 1