Reputation: 369
the motive is to continuously look for new records in my CSV and insert the records to the mssql using pymssql library.
The CSV initially has 244 rows and I'm trying to insert 1 value and wants to dynamically insert the new row only when the script is ran with the scheduler.
I have the script which runs every 15 seconds to insert the values, but post inserting the values the first time, the second time the script throws 'Cannot insert duplicate key in object' as I have my first column DateID
which is set a PK and terminates the statement from the first record itself, therefore doesn't insert the new row.
How do I encounter this.
Code:
def trial():
try:
for row in df.itertuples():
datevalue = datetime.datetime.strptime(row.OrderDate, format)
query= "INSERT INTO data (OrderDate, Region, City, Category) VALUES (%s,%s,%s,%s)"
cursor.execute(query, (datevalue, row.Region,row.City,row.Category))
print('"Values inserted')
conn.commit()
conn.close()
except Exception as e:
print("Handle error", e)
pass
schedule.every(15).seconds.do(trial)
Library used: pymssql SQL: MSSQL server 2019
Upvotes: 0
Views: 846
Reputation: 107652
To avoid duplicate values, consider adjusting query to use EXCEPT
clause (part of UNION
and INTERSECT
set operator family) against actual data. Also, consider using executemany
by passing a nested list of all row/column data with DataFrame.to_numpy().tolist()
.
By the way if OrderDate
column is a datetime
type in data frame and database table, you do not need to re-format to string value.
def trial():
try:
query= (
"INSERT INTO data (OrderDate, Region, City, Category) "
"SELECT %s, %s, %s, %s "
"EXCEPT "
"SELECT OrderDate, Region, City, Category "
"FROM data"
)
vals = df[["OrderDate", "Region", "City", "Category"]].to_numpy()
vals = tuple(map(tuple, vals))
cur.executemany(query, vals)
print('Values inserted')
conn.commit()
except Exception as e:
print("Handle error", e)
finally:
cur.close()
conn.close()
For a faster, bulk insert, consider using a staging, temp table:
# CREATE EMPTY TEMP TABLE
query = "SELECT TOP 0 OrderDate, Region, City, Category INTO #pydata FROM data"
cur.execute(query)
# INSERT INTO TEMP TABLE
query= (
"INSERT INTO #pydata (OrderDate, Region, City, Category) "
"VALUES (%s, %s, %s, %s) "
)
vals = df[["OrderDate", "Region", "City", "Category"]].to_numpy()
vals = tuple(map(tuple, vals))
cur.execute("BEGIN TRAN")
cur.executemany(query, vals)
# MIGRATE TO FINAL TABLE
query= (
"INSERT INTO data (OrderDate, Region, City, Category) "
"SELECT OrderDate, Region, City, Category "
"FROM #pydata "
"EXCEPT "
"SELECT OrderDate, Region, City, Category "
"FROM data"
)
cur.execute(query)
conn.commit()
print("Values inserted")
Upvotes: 2