Usman Rafiq
Usman Rafiq

Reputation: 580

Avoid Inserting those records which is already in SQL table

I am trying to insert pandas Data Frame into SQL using sqlalchemy. The table is already existed in Database with three columns ID, Brand and Price. ID is identity column . How I can I check before inserting each row from pandas data frame if the Brand is already existed or not.

    import pandas as pd

    cars = {'Brand': ['Honda Civic','Toyota Corolla','Ford Focus','Audi A4'],
            'Price': [22000,25000,27000,35000]
            }

    df = pd.DataFrame(cars, columns = ['Brand', 'Price'])

    from sqlalchemy import create_engine, MetaData, Table, select
    from six.moves import urllib

    params = urllib.parse.quote_plus("DRIVER={SQL Server};SERVER=server;DATABASE=mydb;UID=user;PWD=psw")
    engine = create_engine("mssql+pyodbc:///?odbc_connect=%s" % params) 
    engine.connect() 

    # suppose df is the data-frame that we want to insert in database
    df.to_sql(name='mytable',con=engine, index=False, if_exists='append')

    print("inserted)

Upvotes: 2

Views: 1200

Answers (1)

Rob Raymond
Rob Raymond

Reputation: 31226

You are really looking at a 30 year old relational database insert pattern. INSERT if it's not already in unique index (auto increment is not a meaningful unique key)

I've used mariadb but approach is same across all DBMS. Just stick to SQL92 standard.

  1. name your temp table
  2. name the real table
  3. define what defines unique key

table definition

create table car (
    id double not null AUTO_INCREMENT,
    brand varchar(20) not null,
    price double,
    primary key (id, brand),
    unique key (brand)
)

python/sqlachemy to insert if it does not exist

import pandas as pd
from sqlalchemy import create_engine, MetaData, Table, select

cars = {'Brand': ['Honda Civic','Toyota Corolla','Ford Focus','Audi A4'],
            'Price': [22000,25000,27000,35000]
            }

df = pd.DataFrame(cars, columns = ['Brand', 'Price'])

engine = create_engine('mysql+pymysql://sniffer:[email protected]/sniffer')

temptable = "temp"
table = "car"
key = ["Brand"]
df.to_sql(name=temptable,con=engine, index=False, if_exists='append')
conn = engine.connect()
transfersql = f"""insert into {table} ({",".join(df.columns)}) 
                 select * from {temptable} t 
                 where not exists 
                   (select 1 from {table} m 
                   where {"and".join([f" t.{col} = m.{col} " for col in key])}
                   )"""
print(transfersql)
conn.execute(transfersql)
conn.execute(f"drop table {temptable}")
conn.close()

output (generated sql)

insert into car (Brand,Price) 
                 select * from temp t 
                 where not exists 
                   (select 1 from car m 
                   where  t.Brand = m.Brand 
                   )

Upvotes: 3

Related Questions