pandas dataframe header relation to sql table header

Question

I have a DataFrame with 100+ columns that I am trying to write into a SQL table. This is a daily process with a full data load, and it looks like so:

now = datetime.datetime.now()
filename = 'extract_' + str(now)[:10]
output = "./output"

sql_df = pd.read_csv(os.path.join(output,filename + '.csv'))
server = "redshift+psycopg2://%s:%s@%s:%s/%s" % (USER,PASSWORD,HOST,str(PORT),DATABASE)
engine = sa.create_engine(server, connect_args={'sslmode': 'verify-ca'}, use_batch_mode=True)
conn = engine.raw_connection()

conn.cursor().execute("truncate table")
%time sql_df.to_sql('table', engine, index=False, if_exists='append')

I am getting the following error: ProgrammingError: (psycopg2.ProgrammingError) column "column" of relation "table" does not exist.

Can someone clarify why does to_sql compare the headers of the DataFrame and the table (and doesn't allow operation unless all match), rather than just inserting values appropriately excluding header row? If instead of writing DF to SQL I upload csv to S3, then use "copy"- there are no errors..

Anything I can do for it to ignore headers and just insert values?

d_kennetz · Accepted Answer

You can modify the input df to match the names in the table like this (where db_cols is your database column names), I think this should work for your MySQLdb situation:

db_cols = list(pd.read_sql('...')) # where ... is your table will return columns as list

(sql_df
 .rename(columns=dict(zip(sql_df.columns, db_cols)))
 .to_sql(name="table",
         con=alch_engine,
         if_exists="append",
         index=False,
         index_label=None))

pandas dataframe header relation to sql table header

Answers (1)

Related Questions