Reputation: 51
I have a postgres( PostgreSQL 12.3) db table which consists of 100k rows and I have a python script which reads from this table and perform some action based on the data. I want to run the same script through multiple machines so that data can be processed faster. But when I run from multiple machines, I want to make sure one row is processed by only one machine at a time, basically achieving it by locking that row.
Can you provide some pointers on how locking row can be achieved through python. I am using psycopg2 module to read and update data from table, but did not find a way to lock the row data.
Upvotes: 4
Views: 2263
Reputation: 14081
Use SELECT ... FOR UPDATE SKIP LOCKED
, which performs a row-level lock on any rows that are returned, and skips rows that are already locked by another transaction.
Session 1:
testdb=# create table tt(x text);
CREATE TABLE
testdb=# insert into tt select 'foo';
INSERT 0 1
testdb=# insert into tt select 'bar';
INSERT 0 1
testdb=# BEGIN;
BEGIN
testdb=# SELECT * FROM tt ORDER BY x LIMIT 1 FOR UPDATE SKIP LOCKED;
x
-----
bar
(1 row)
testdb=# -- SELECT in session 2 is performed now.
testdb=# commit;
COMMIT
Session 2:
testdb=# BEGIN;
BEGIN
testdb=# SELECT * FROM tt ORDER BY x LIMIT 1 FOR UPDATE SKIP LOCKED;
x
-----
foo
(1 row)
testdb=# commit;
COMMIT
You won't need to do anything special with psycopg2 for this, as long as your query already has a reasonably small limit (so the SELECT doesn't lock every row), adding FOR UPDATE SKIP LOCKED should do what you're asking for. See the docs on the locking clause for more details.
Upvotes: 6