Reputation: 1040
I have a table on SQL Server that looks like this, where each row has a unique combination of Event A
and Event B
.
`Global Rules Table
ID Event 1 | Event 2 | Validated as | Generated as | Generated with score
1 EA1 EB1 Rule Anti-Rule 0.01
2 EA1 EB2 Rule Rule 0.95
3 ... ... ... ... ...
I have another table with a Foreign Key constraint to Global Rules Table
called Local Rules Table
.
I have a Pandas DataFrame that looks like this
Event 1 | Event 2 | Validated as | Generated as | Generated with score
EA1 EB1 Rule Rule 0.85
EA1 EB2 Rule Rule 0.95
... ... ... ... ...
Since I have this Foreign Key constraint between Local Rules
and Global Rules
tables I can't use df.to_sql('Global Rules',con,if_exists='replace')
.
The columns which I want to update in the database based on values in dataframe are Generated as
and Generated with score
, so what is the best way to only update those columns in database table based on the DataFrame I have? Is there some out of the box function or library which I don't know about?
Upvotes: 0
Views: 1057
Reputation: 1501
I haven't found a library to accomplish this. I started writing one myself to host on PyPi but haven't finished yet.
An inner join against an SQL temporary table works well in this case. It will only update a subset of columns in SQL and can be efficient for updating many records.
I assume you are using pyodbc for the connection to SQL server.
# quickly stream records into the temp table
cursor.fast_executemany = True
# assuming your DataFrame also has the ID column to perform the SQL join
statement = "CREATE TABLE [#Update_Global Rules Table] (ID BIGINT PRIMARY KEY, [Generated as] VARCHAR(200), [Generated with score] FLOAT)"
cursor.execute(statement)
# insert only the key and the updated values
subset = df[['ID','Generated as','Generated with score']]
# form SQL insert statement
columns = ", ".join(subset.columns)
values = '('+', '.join(['?']*len(subset.columns))+')'
# insert
statement = "INSERT INTO [#Update_Global Rules Table] ("+columns+") VALUES "+values
insert = [tuple(x) for x in subset.values]
cursor.executemany(statement, insert)
statement = '''
UPDATE
[Global Rules Table]
SET
u.Name
FROM
[Global Rules Table] AS t
INNER JOIN
[#Update_Global Rules Table] AS u
ON
u.ID=t.ID;
'''
cursor.execute(statement)
cursor.execute("DROP TABLE [#Update_Global Rules Table]")
Upvotes: 1