Reputation:
I have a pandas dataframe which has 10 columns and 10 million rows.
I have created an empty table in pgadmin4 (an application to manage databases like MSSQL server) for this data to be stored.
However, when running the following command:
my_dataframe.to_sql('name_of_sql_table',connection, index = False, if_exists = 'append', method="multi")
It takes a very long time in order to run and often crashes my jupyter kernel given that the process is so long/runs out of memory.
Is there any advisable methods for speeding up the "sending pandas to sql table"?
Some things i can think of would be to split the data into say 1million row chunks then send them one at a time - appending the rows as you run the to_sql()
method.
I do not have the option of directly loading the data into pgadmin4 - my only method is to send data from python to pgadmin.
Upvotes: 2
Views: 13717
Reputation: 31
I faced this issue as well but i do not use method='multi' and when using the chunksize=1000 it was crashing with below error.
ProgrammingError("(pyodbc.ProgrammingError) ('42000', '[42000] [Microsoft][ODBC SQL Server Driver][SQL Server]The incoming request has too many parameters. The server supports a maximum of 2100 parameters. Reduce the number of parameters and resend the request. (8003) (SQLExecDirectW)')",),
so, i have divide my number of columns with 2100 and use 150 as the chunksize. If there is a better way then please do let me know.
Upvotes: 2
Reputation: 1962
Have a look at https://pandas.pydata.org/pandas-docs/stable/reference/api/pandas.DataFrame.to_sql.html
If this applies to your version of pandas
, use
df.to_sql("table_name",
connection,
index=False,
if_exists='append',
chunksize=25000,
method=None)
Your query might be crashing because you're using method='multi'
, as this does the following:
method : {None, ‘multi’, callable}, default None
Controls the SQL insertion clause used:
‘multi’: Pass multiple values in a single INSERT clause. callable with signature (pd_table, conn, keys, data_iter). Details and a sample callable implementation can be found in the section insert method.
Which means that pandas
would construct the statement in memory for all rows. Using chunksize
and one INSERT
statement per row will allow pandas
to chunk-up the save to db.
Upvotes: 5