Reputation: 5057
Is there a way to select rows in Postgresql that aren't locked? I have a multi-threaded app that will do:
Select... order by id desc limit 1 for update
on a table.
If multiple threads run this query, they both try to pull back the same row.
One gets the row lock, the other blocks and then fails after the first one updates the row. What I'd really like is for the second thread to get the first row that matches the WHERE
clause and isn't already locked.
To clarify, I want each thread to immediately update the first available row after doing the select.
So if there are rows with ID: 1,2,3,4
, the first thread would come in, select the row with ID=4
and immediately update it.
If during that transaction a second thread comes it, I'd like it to get row with ID=3
and immediately update that row.
For Share won't accomplish this nor with nowait
as the WHERE
clause will match the locked row (ID=4 in my example)
. Basically what I'd like is something like "AND NOT LOCKED" in the WHERE
clause.
Users
-----------------------------------------
ID | Name | flags
-----------------------------------------
1 | bob | 0
2 | fred | 1
3 | tom | 0
4 | ed | 0
If the query is "Select ID from users where flags = 0 order by ID desc limit 1
" and when a row is returned the next thing is "Update Users set flags = 1 where ID = 0
" then I'd like the first thread in to grab the row with ID 4
and the next one in to grab the row with ID 3
.
If I append "For Update
" to the select then the first thread gets the row, the second one blocks and then returns nothing because once the first transaction commits the WHERE
clause is no longer satisfied.
If I don't use "For Update
" then I need to add a WHERE clause on the subsequent update (WHERE flags = 0) so only one thread can update the row.
The second thread will select the same row as the first but the second thread's update will fail.
Either way the second thread fails to get a row and update because I can't get the database to give row 4 to the first thread and row 3 to the second thread the the transactions overlap.
Upvotes: 53
Views: 21670
Reputation: 1303
This feature, SELECT ... SKIP LOCKED
is being implemented in Postgres 9.5. http://www.depesz.com/2014/10/10/waiting-for-9-5-implement-skip-locked-for-row-level-locks/
Upvotes: 41
Reputation: 3830
Used in multi-thread and cluster?
How about this?
START TRANSACTION;
// All thread retrive same task list
// If result count is very big, using cursor
// or callback interface provied by ORM frameworks.
var ids = SELECT id FROM tableName WHERE k1=v1;
// Each thread get an unlocked recored to process.
for ( id in ids ) {
var rec = SELECT ... FROM tableName WHERE id =#id# FOR UPDATE NOWAIT;
if ( rec != null ) {
... // do something
}
}
COMMIT;
Upvotes: 0
Reputation: 101
No No NOOO :-)
I know what the author means. I have a similar situation and i came up with a nice solution. First i will start from describing my situation. I have a table i which i store messages that have to be sent at a specific time. PG doesn't support timing execution of functions so we have to use daemons (or cron). I use a custom written script that opens several parallel processes. Every process selects a set of messages that have to be sent with the precision of +1 sec / -1 sec. The table itself is dynamically updated with new messages.
So every process needs to download a set of rows. This set of rows cannot be downloaded by the other process because it will make a lot of mess (some people would receive couple messages when they should receive only one). That is why we need to lock the rows. The query to download a set of messages with the lock:
FOR messages in select * from public.messages where sendTime >= CURRENT_TIMESTAMP - '1 SECOND'::INTERVAL AND sendTime <= CURRENT_TIMESTAMP + '1 SECOND'::INTERVAL AND sent is FALSE FOR UPDATE LOOP
-- DO SMTH
END LOOP;
a process with this query is started every 0.5 sec. So this will result in the next query waiting for the first lock to unlock the rows. This approach creates enormous delays. Even when we use NOWAIT the query will result in a Exception which we don't want because there might be new messages in the table that have to be sent. If use simply FOR SHARE the query will execute properly but still it will take a lot of time creating huge delays.
In order to make it work we do a little magic:
changing the query:
FOR messages in select * from public.messages where sendTime >= CURRENT_TIMESTAMP - '1 SECOND'::INTERVAL AND sendTime <= CURRENT_TIMESTAMP + '1 SECOND'::INTERVAL AND sent is FALSE AND is_locked(msg_id) IS FALSE FOR SHARE LOOP
-- DO SMTH
END LOOP;
the mysterious function 'is_locked(msg_id)' looks like this:
CREATE OR REPLACE FUNCTION is_locked(integer) RETURNS BOOLEAN AS $$
DECLARE
id integer;
checkout_id integer;
is_it boolean;
BEGIN
checkout_id := $1;
is_it := FALSE;
BEGIN
-- we use FOR UPDATE to attempt a lock and NOWAIT to get the error immediately
id := msg_id FROM public.messages WHERE msg_id = checkout_id FOR UPDATE NOWAIT;
EXCEPTION
WHEN lock_not_available THEN
is_it := TRUE;
END;
RETURN is_it;
END;
$$ LANGUAGE 'plpgsql' VOLATILE COST 100;
Of course we can customize this function to work on any table you have in your database. In my opinion it is better to create one check function for one table. Adding more things to this function can make it only slower. I takes longer to check this clause anyways so there is no need to make it even slower. For me this the complete solution and it works perfectly.
Now when i have my 50 processes running in parallel every process has a unique set of fresh messages to send. Once the are sent i just update the row with sent = TRUE and never go back to it again.
I hope this solution will also work for you (author). If you have any question just let me know :-)
Oh, and let me know if this worked for you as-well.
Upvotes: 10
Reputation: 6548
My solution is to use the UPDATE statement with the RETURNING clause.
Users
-----------------------------------
ID | Name | flags
-----------------------------------
1 | bob | 0
2 | fred | 1
3 | tom | 0
4 | ed | 0
Instead of SELECT .. FOR UPDATE
use
BEGIN;
UPDATE "Users"
SET ...
WHERE ...;
RETURNING ( column list );
COMMIT;
Because the UPDATE statement obtains a ROW EXCLUSIVE lock on the table its updating you get serialized updates. Reads are still allowed, but they only see data before the start of the UPDATE transaction.
Reference: Concurrency Control Chapter of Pg docs.
Upvotes: 1
Reputation: 36719
If you are trying to implement a queue, take a look at PGQ, which has solved this and other problems already. http://wiki.postgresql.org/wiki/PGQ_Tutorial
Upvotes: 5
Reputation: 71
I use something like this:
select *
into l_sms
from sms
where prefix_id = l_prefix_id
and invoice_id is null
and pg_try_advisory_lock(sms_id)
order by suffix
limit 1;
and don't forget to call pg_advisory_unlock
Upvotes: 7
Reputation: 9
^^ that works. consider having an "immediate" status of "locked".
Let's say your table is like that:
And possible statuses for example are: 1=pending, 2=locked, 3=processed, 4=fail, 5=rejected
Every new record gets inserted with status pending(1)
Your program does: "update mytable set status = 2 where id = (select id from mytable where name like '%John%' and status = 1 limit 1) returning id, name, surname"
Then your program does its thing and if it cames up with the conclusion that this thread shouldn't had processed that row at all, it does: "update mytable set status = 1 where id = ?"
Otherside it updates to the other statuses.
Upvotes: 0
Reputation:
I faced the same problem in our application and came up with a solution that is very similar to Grant Johnson's approach. A FIFO or LIFO pipe was not an option because we have a cluster of application servers accessing one DB. What we do is a
SELECT ... WHERE FLAG=0 ... FOR UPDATE
immediately followed by a UPDATE ... SET FLAG=1 WHERE ID=:id
as soon as possible in order to keep the lock time as low as possible. Depending on the table column count and sizes it might help to only fetch the ID in the first select and once you've marked the row to fetch the remaining data. A stored procedure can reduce the amount of round-trips even more.
Upvotes: 0
Reputation: 1244
It appears that you are trying to do something like grab the highest priority item in a queue that is not already being taken care of by another process.
A likely solution is to add a where clause limiting it to unhandled requests:
select * from queue where flag=0 order by id desc for update;
update queue set flag=1 where id=:id;
--if you really want the lock:
select * from queue where id=:id for update;
...
Hopefully, the second transaction will block while the update to the flag happens, then it will be able to continue, but the flag will limit it to the next in line.
It is also likely that using the serializable isolation level, you can get the result you want without all of this insanity.
Depending on the nature of your application, there may be better ways of implementing this than in the database, such as a FIFO or LIFO pipe. Additionally, it may be possible to reverse the order that you need them in, and use a sequence to ensure that they are processed sequentially.
Upvotes: 2
Reputation: 1071
How about the following? It might be treated more atomically than the other examples but should still be tested to make sure my assumptions aren't wrong.
UPDATE users SET flags = 1 WHERE id = ( SELECT id FROM users WHERE flags = 0 ORDER BY id DESC LIMIT 1 ) RETURNING ...;
You'll probably still be stuck with whatever locking scheme postgres uses internally to supply consistent SELECT results in the face of a simultaneous UPDATEs.
Upvotes: 0
Reputation: 5057
Since I haven't found a better answer yet, I've decided to use locking within my app to synchronize access to the code that does this query.
Upvotes: 0
Reputation: 1071
What are you trying to accomplish? Can you better explain why neither unlocked row updates nor full transactions will do what you want?
Better yet, can you prevent contention and simply have each thread use a different offset? This won't work well if the relevant portion of the table is being updated frequently; you'll still have collisions but only during heavy insert load.
Select... order by id desc offset THREAD_NUMBER limit 1 for update
Upvotes: -1
Reputation: 3344
Looks like you're looking for a SELECT FOR SHARE.
http://www.postgresql.org/docs/8.3/interactive/sql-select.html#SQL-FOR-UPDATE-SHARE
FOR SHARE behaves similarly, except that it acquires a shared rather than exclusive lock on each retrieved row. A shared lock blocks other transactions from performing UPDATE, DELETE, or SELECT FOR UPDATE on these rows, but it does not prevent them from performing SELECT FOR SHARE.
If specific tables are named in FOR UPDATE or FOR SHARE, then only rows coming from those tables are locked; any other tables used in the SELECT are simply read as usual. A FOR UPDATE or FOR SHARE clause without a table list affects all tables used in the command. If FOR UPDATE or FOR SHARE is applied to a view or sub-query, it affects all tables used in the view or sub-query.
Multiple FOR UPDATE and FOR SHARE clauses can be written if it is necessary to specify different locking behavior for different tables. If the same table is mentioned (or implicitly affected) by both FOR UPDATE and FOR SHARE clauses, then it is processed as FOR UPDATE. Similarly, a table is processed as NOWAIT if that is specified in any of the clauses affecting it.
FOR UPDATE and FOR SHARE cannot be used in contexts where returned rows cannot be clearly identified with individual table rows; for example they cannot be used with aggregation.
Upvotes: 0