user10624646
user10624646

Reputation:

Best method for sending large pandas dataframe to SQL database?

I have a pandas dataframe which has 10 columns and 10 million rows.

I have created an empty table in pgadmin4 (an application to manage databases like MSSQL server) for this data to be stored.

However, when running the following command:

my_dataframe.to_sql('name_of_sql_table',connection, index = False, if_exists = 'append', method="multi") 

It takes a very long time in order to run and often crashes my jupyter kernel given that the process is so long/runs out of memory.

Is there any advisable methods for speeding up the "sending pandas to sql table"?

Some things i can think of would be to split the data into say 1million row chunks then send them one at a time - appending the rows as you run the to_sql() method.

I do not have the option of directly loading the data into pgadmin4 - my only method is to send data from python to pgadmin.

Upvotes: 2

Views: 13717

Answers (2)

Sunny
Sunny

Reputation: 31

I faced this issue as well but i do not use method='multi' and when using the chunksize=1000 it was crashing with below error.

ProgrammingError("(pyodbc.ProgrammingError) ('42000', '[42000] [Microsoft][ODBC SQL Server Driver][SQL Server]The incoming request has too many parameters. The server supports a maximum of 2100 parameters. Reduce the number of parameters and resend the request. (8003) (SQLExecDirectW)')",),

so, i have divide my number of columns with 2100 and use 150 as the chunksize. If there is a better way then please do let me know.

Upvotes: 2

Napoleon Borntoparty
Napoleon Borntoparty

Reputation: 1962

Have a look at https://pandas.pydata.org/pandas-docs/stable/reference/api/pandas.DataFrame.to_sql.html

If this applies to your version of pandas, use

df.to_sql("table_name", 
          connection, 
          index=False, 
          if_exists='append',
          chunksize=25000,
          method=None)

Your query might be crashing because you're using method='multi', as this does the following:

method : {None, ‘multi’, callable}, default None

Controls the SQL insertion clause used:

‘multi’: Pass multiple values in a single INSERT clause. callable with signature (pd_table, conn, keys, data_iter). Details and a sample callable implementation can be found in the section insert method.

Which means that pandas would construct the statement in memory for all rows. Using chunksize and one INSERT statement per row will allow pandas to chunk-up the save to db.

Upvotes: 5

Related Questions