Nipun Rai
Nipun Rai

Reputation: 41

Inserting Data to SQL Server from a Python Dataframe Quickly

I have been trying to insert data from a dataframe in Python to a table already created in SQL Server. The data frame has 90K rows and wanted the best possible way to quickly insert data in the table. I only have read,write and delete permissions for the server and I cannot create any table on the server.

Below is the code which is inserting the data but it is very slow. Please advise.

import pandas as pd
import xlsxwriter
import pyodbc

df = pd.read_excel(r"Url path\abc.xlsx")
conn = pyodbc.connect('Driver={ODBC Driver 11 for SQL Server};'
                      'SERVER=Server Name;'
                      'Database=Database Name;'
                      'UID=User ID;'
                      'PWD=Password;'
                      'Trusted_Connection=no;')
cursor= conn.cursor()
#Deleting existing data in SQL Table:- 
cursor.execute("DELETE FROM datbase.schema.TableName")
conn.commit()
#Inserting data in SQL Table:- 
for index,row in df.iterrows():
    cursor.execute("INSERT INTO Table Name([A],[B],[C],) values (?,?,?)", row['A'],row['B'],row['C']) 
conn.commit()
cursor.close()
conn.close()

Upvotes: 4

Views: 46303

Answers (5)

Lenn Denis
Lenn Denis

Reputation: 1

Try to populate a temp table with 1 or none indexes then insert it into your good table all at once. Might speed things up due to not having to update the indexes after each insert??

Upvotes: -2

Lakshmi Manikandan
Lakshmi Manikandan

Reputation: 31

Here is the script and hope this works for you.

import pandas as pd    
import pyodbc as pc    
connection_string = "Driver=SQL Server;Server=localhost;Database={0};Trusted_Connection=Yes;"                           
cnxn = pc.connect(connection_string.format("DataBaseNameHere"), autocommit=True)    
cur=cnxn.cursor()
df= pd.read_csv("your_filepath_and_filename_here.csv").fillna('')    
query = 'insert into TableName({0}) values ({1})'    
query = query.format(','.join(df.columns), ','.join('?' * len(df1.columns)))        
cur.fast_executemany = True    
cur.executemany(query, df.values.tolist())
cnxn.close()

Upvotes: 3

Matt
Matt

Reputation: 179

Replace df.iterrows() with df.apply() for one thing. Remove the loop for something much more efficient.

Upvotes: -1

ASH
ASH

Reputation: 20362

This should do what you want...very generic example...

# Insert from dataframe to table in SQL Server
import time
import pandas as pd
import pyodbc

# create timer
start_time = time.time()
from sqlalchemy import create_engine


df = pd.read_csv("C:\\your_path\\CSV1.csv")

conn_str = (
    r'DRIVER={SQL Server Native Client 11.0};'
    r'SERVER=Excel-PC\SQLEXPRESS;'
    r'DATABASE=NORTHWND;'
    r'Trusted_Connection=yes;'
)
cnxn = pyodbc.connect(conn_str)

cursor = cnxn.cursor()

for index,row in df.iterrows():
    cursor.execute('INSERT INTO dbo.Table_1([Name],[Address],[Age],[Work]) values (?,?,?,?)', 
                    row['Name'], 
                    row['Address'], 
                    row['Age'],
                    row['Work'])
    cnxn.commit()
cursor.close()
cnxn.close()

# see total time to do insert
print("%s seconds ---" % (time.time() - start_time))

Try that and post back if you have additional questions/issues/concerns.

Upvotes: -1

NYC Coder
NYC Coder

Reputation: 7604

To insert data much faster, try using sqlalchemy and df.to_sql. This requires you to create an engine using sqlalchemy, and to make things faster use the option fast_executemany=True

connect_string = urllib.parse.quote_plus(f'DRIVER={{ODBC Driver 11 for SQL Server}};Server=<Server Name>,<port>;Database=<Database name>')
engine = sqlalchemy.create_engine(f'mssql+pyodbc:///?odbc_connect={connect_string}', fast_executemany=True)

with engine.connect() as connection:
  df.to_sql(<table name>, connection, index=False)

Upvotes: 5

Related Questions