Bethlee
Bethlee

Reputation: 945

How can I insert pandas dataframe to database without data duplication?

I used .to_sql function to insert data. But it can't check duplicated insert data. (It only can check the duplicated tables)

source code) when I run twice of the source code below.

userData.to_sql(con=engine, name='test_quest_complete', schema='test', if_exists='append')

the results) Same data was inserted in the table.

0   2016-11-14 00:00:10 AAAA
1   2016-11-14 00:00:20 BBBB
0   2016-11-14 00:00:10 AAAA
1   2016-11-14 00:00:20 BBBB

How can I insert pandas dataframe to database without data duplication?

(Also, I tried to use load data local infile, but I can't use it by reason of security issues.)

Upvotes: 1

Views: 2879

Answers (2)

Poonam
Poonam

Reputation: 679

import pandas as pd
import pypyodbc
from sqlalchemy import create_engine


##Data of Excel File - ExcelData(Sheet1)
##id    name
##1 11
##2 22
##3 33
##4 44
##5 55


##CREATE TABLE [test].[test_quest_complete](
##  [id] [int] NULL,
##  [name] [int] NULL
##) 


TblName="test_quest_complete"
cnxn = pypyodbc.connect("dsn=mydsn;Trusted_Connection=Yes")
engine = create_engine("mssql+pyodbc://mydsn")

file_name="C:\Users\poonamr\Desktop\ExcelData.xlsx"
xl = pd.ExcelFile(file_name)
userData = xl.parse("Sheet1")
print(userData)

sql="Select * From test." + TblName
tblData=pd.read_sql(sql,cnxn)
print(tblData)

Finalresult=pd.concat([userData, tblData]).drop_duplicates(keep=False)
print(Finalresult)

Finalresult.to_sql(TblName, engine, if_exists='append',schema='test', index=False)

Upvotes: 1

hanego
hanego

Reputation: 1635

If you have administration rights on your database, I would suggest you to put some constraints on the table itself. Then the python insertion will raise an exception (and you can intercept it). Else you can also try to retrieve the data first from the table and merge it inside pandas. Then do a group by on all the columns and get the non-existent data as a new dataframe and insert it.

Upvotes: 3

Related Questions