Reputation: 66565
I tried to update multiple rows (approx. 350000) with a single query by implementing the following function:
def update_items(rows_to_update):
sql_query = """UPDATE contact as t SET
name = e.name
FROM (VALUES %s) AS e(id, name)
WHERE e.id = t.id;"""
conn = get_db_connection()
cur = conn.cursor()
psycopg2.extras.execute_values (
cur, sql_query, rows_to_update, template=None, page_size=100
)
While trying to run the function above, only 31 records were updated. Then, I tried to update row by row with the following function:
def update_items_row_by_row(rows_to_update):
sql_query = """UPDATE contact SET name = %s WHERE id = %s"""
conn = get_db_connection()
with tqdm(total=len(rows_to_update)) as pbar:
for id, name in rows_to_update:
cur = conn.cursor()
# execute the UPDATE statement
cur.execute(sql_query, (name, id))
# get the number of updated rows
# Commit the changes to the database
conn.commit()
cur.close()
pbar.update(1)
The latter has updated all the records so far but is very slow (estimated to end in 9 hours). Does anyone know what is the efficient way to update multiple records?
Upvotes: 0
Views: 2882
Reputation: 2567
The problem with your original function appears to be that you forgot to apply commit
. When you execute an insert/update query with psycopg2 a transaction is opened but not finalized until commit
is called. See my edits in your function (towards the bottom).
def update_items(rows_to_update):
sql_query = """UPDATE contact as t SET
name = e.name
FROM (VALUES %s) AS e(id, name)
WHERE e.id = t.id;"""
conn = get_db_connection()
cur = conn.cursor()
psycopg2.extras.execute_values(cur, sql_query, rows_to_update)
## solution below ##
conn.commit() # <- We MUST commit to reflect the inserted data
cur.close()
conn.close()
return "success :)"
If you don't want to call conn.commit()
each time you create a new cursor, you can use autocommit
such as
conn = get_db_connection()
conn.set_session(autocommit=True)
Upvotes: 0
Reputation: 66565
By splitting the list into chunks of size equal to page_size, it worked well:
def update_items(rows_to_update):
sql_query = """UPDATE contact as t SET
name = data.name
FROM (VALUES %s) AS data (id, name)
WHERE t.id = data.id"""
conn = get_db_connection()
cur = conn.cursor()
n = 100
with tqdm(total=len(rows_to_update)) as pbar:
for i in range(0, len(rows_to_update), n):
psycopg2.extras.execute_values (
cur, sql_query, rows_to_update[i:i + n], template=None, page_size=n
)
conn.commit()
pbar.update(cur.rowcount)
cur.close()
conn.close()
Upvotes: 1