Reputation: 583
So I tried to update multiple rows (50 rows atleast) like this which will lead to a time greater than 2 minutes (Slow):
query_job_list = []
for data in data_list:
update statement = ...
query_job = client.query(update_statement)
query_job_list.append(query_job)
for query_job in query_job_list:
query_job.result()
Doing the above code will result in:
BadRequest('Could not serialize access due to concurrent update')
I remember it is being locked so we cannot update while job is still running. (Sorry, not SQL expert)
I was just wondering what is the fastest way to update multiple rows on Bigquery?
Upvotes: 1
Views: 7893
Reputation: 1464
You are seeing that error due to concurrent updates. This issue generally occurs when there are multiple concurrent updates against a table, mutating DML queries (UPDATE, MERGE, DELETE) are running concurrently and therefore possibly conflicting with each other. For mutating DML, to maintain consistency, in case there are multiple queries that run at roughly the same time, it's possible that one of them failed. Refer to this doc for more information about DML concurrency for mutating DML and Google recommended patterns for better performance.
My suggestion would be to space out the DML operations to different tables such that they don't operate on the same table at the same time, or start another one only after the previous one completed. Also, as @rtenha mentioned in the comment, avoid submitting large numbers of individual row updates or insertions. Instead, group DML operations together when possible.
Upvotes: 2