kspr
kspr

Reputation: 1040

Best way to update certain columns of table in SQL Server based on Pandas DataFrame?

I have a table on SQL Server that looks like this, where each row has a unique combination of Event A and Event B.

`Global Rules Table

ID   Event 1  |  Event 2  |  Validated as |  Generated as  |   Generated with score 
1      EA1       EB1           Rule            Anti-Rule            0.01
2      EA1       EB2           Rule            Rule                 0.95  
3      ...       ...           ...             ...                  ...

I have another table with a Foreign Key constraint to Global Rules Table called Local Rules Table.

I have a Pandas DataFrame that looks like this

      Event 1  |  Event 2  |  Validated as |  Generated as  |   Generated with score 
        EA1       EB1           Rule            Rule                 0.85
        EA1       EB2           Rule            Rule                 0.95  
        ...       ...           ...             ...                  ...

Since I have this Foreign Key constraint between Local Rules and Global Rules tables I can't use df.to_sql('Global Rules',con,if_exists='replace').

The columns which I want to update in the database based on values in dataframe are Generated as and Generated with score, so what is the best way to only update those columns in database table based on the DataFrame I have? Is there some out of the box function or library which I don't know about?

Upvotes: 0

Views: 1057

Answers (1)

Jason Cook
Jason Cook

Reputation: 1501

I haven't found a library to accomplish this. I started writing one myself to host on PyPi but haven't finished yet.

An inner join against an SQL temporary table works well in this case. It will only update a subset of columns in SQL and can be efficient for updating many records.

I assume you are using pyodbc for the connection to SQL server.

SQL Cursor

# quickly stream records into the temp table
cursor.fast_executemany = True

Create Temporary Table

# assuming your DataFrame also has the ID column to perform the SQL join
statement = "CREATE TABLE [#Update_Global Rules Table] (ID BIGINT PRIMARY KEY, [Generated as] VARCHAR(200), [Generated with score] FLOAT)"
cursor.execute(statement)

Insert DataFrame into a Temporary Table

# insert only the key and the updated values
subset = df[['ID','Generated as','Generated with score']]

# form SQL insert statement
columns = ", ".join(subset.columns)
values = '('+', '.join(['?']*len(subset.columns))+')'

# insert
statement = "INSERT INTO [#Update_Global Rules Table] ("+columns+") VALUES "+values
insert = [tuple(x) for x in subset.values]

cursor.executemany(statement, insert)

Update Values in Main Table from Temporary Table

statement = '''
UPDATE
     [Global Rules Table]
SET
     u.Name
FROM
     [Global Rules Table] AS t
INNER JOIN 
     [#Update_Global Rules Table] AS u 
ON
     u.ID=t.ID;
'''

cursor.execute(statement)

Drop Temporary Table

cursor.execute("DROP TABLE [#Update_Global Rules Table]")

Upvotes: 1

Related Questions