cvluepke
cvluepke

Reputation: 105

Speed up pandas df to sql table

I have a pandas df that looks like this:

Datum      Kasse Bon Articles
2019-05-01 101   1   Oranges
2019-05-01 101   2   Apples
2019-05-01 101   3   Banana

Basically it's four columns (date, smallint, smallint, string). It contains millions of rows.

I am using the following code to import it into my database.

query_insert_tmp = """
    INSERT INTO <my_table>
    VALUES ('{}',{},{},'{}')
    """

for index, r in edited_df.iterrows():
    cursor.execute(query_insert_tmp.format(r[0],r[1],r[2],r[3]))

It works fine, but it takes a long time. Do you guys know any other ways to speed it up?

Upvotes: 0

Views: 1619

Answers (2)

ASH
ASH

Reputation: 20302

This is how I would do it.

from datetime import datetime

import pandas as pd

import pyodbc
from fast_to_sql import fast_to_sql as fts

# Test Dataframe for insertion
df = pd.DataFrame({
    "Col1": [1, 2, 3],
    "Col2": ["A", "B", "C"],
    "Col3": [True, False, True],
    "Col4": [datetime(2020,1,1),datetime(2020,1,2),datetime(2020,1,3)]
})

# Create a pyodbc connection
conn = pyodbc.connect(
    """
    Driver={ODBC Driver 17 for SQL Server};
    Server=localhost;
    Database=my_database;
    UID=my_user;
    PWD=my_pass;
    """
)

# If a table is created, the generated sql is returned
create_statement = fts.fast_to_sql(df, "my_great_table", conn, if_exists="replace", custom={"Col1":"INT PRIMARY KEY"}, temp=False)

# Commit upload actions and close connection
conn.commit()
conn.close()

Main function

fts.fast_to_sql(df, name, conn, if_exists="append", custom=None, temp=False)

Or, try this.

import pyodbc

engine = "mssql+pyodbc://server_name/db_name?driver=SQL Server Native Client 11.0?trusted_connection=yes"

# your dataframe is here
df.to_sql(name_of_dataframe, engine, if_exists='append', index=True, chunksize=100000)

https://pypi.org/project/fast-to-sql/

https://pandas.pydata.org/pandas-docs/stable/reference/api/pandas.DataFrame.to_sql.html

Upvotes: 2

Parfait
Parfait

Reputation: 107642

Consider to_numpy()+tolist() and executemany() with parameters:

query_insert_tmp = """
    INSERT INTO <my_table>
    VALUES (?, ?, ?, ?)
    """

sql_data = edited_df.to_numpy().tolist()

cursor.executemany(query_insert_tmp, sql_data)

Upvotes: 2

Related Questions