Reputation: 169
Context: I am using MSSQL, pandas, and pyodbc.
Steps:
UPDATE t SET t.value = df.value FROM dbo.table t where t.ID = df.ID
)Now how do I execute the sql code in the auxilliary column, without looping through each row?
sample data
The first two columns are obtained by querying dbo.table
, the third columns exists but is empty in the database. The fourth column only exists in the dataframe to prepare the SQL statement that would correspond to updating dbo.table
ID | raw | processed | strSQL |
---|---|---|---|
1 | [email protected] | lorum ipsum | UPDATE t SET t.processed = 'lorum ipsum' FROM dbo.table t WHERE t.ID = 1 |
2 | [email protected] | rumlo sumip | UPDATE t SET t.processed = 'rumlo sumip' FROM dbo.table t WHERE t.ID = 2 |
3 | ... | ... | ... |
I would like to execute the SQL script in each row in an efficient manner.
Upvotes: 1
Views: 5344
Reputation: 123409
After I recommended .executemany()
in a comment to the question, a subsequent comment from @Charlieface suggested that a table-valued parameter (TVP) would provide even better performance. I didn't think it would make that much difference, but I was wrong.
For an existing table named MillionRows
ID TextField
-- ---------
1 foo
2 bar
3 baz
…
and example data of the form
num_rows = 1_000_000
rows = [(f"text{x:06}", x + 1) for x in range(num_rows)]
print(rows)
# [('text000000', 1), ('text000001', 2), ('text000002', 3), …]
my test using a standard executemany()
call with cnxn.autocommit = False
and crsr.fast_executemany = True
crsr.executemany("UPDATE MillionRows SET TextField = ? WHERE ID = ?", rows)
took about 180 seconds (3 minutes).
However, by creating a user-defined table type
CREATE TYPE dbo.TextField_ID AS TABLE
(
TextField nvarchar(255) NULL,
ID int NOT NULL,
PRIMARY KEY (ID)
)
and a stored procedure
CREATE PROCEDURE [dbo].[mr_update]
@tbl dbo.TextField_ID READONLY
AS
BEGIN
SET NOCOUNT ON;
UPDATE MillionRows SET TextField = t.TextField
FROM MillionRows mr INNER JOIN @tbl t ON mr.ID = t.ID
END
when I used
crsr.execute("{CALL mr_update (?)}", (rows,))
it did the same update in approximately 80 seconds (less than half the time).
Upvotes: 3