Reputation: 90043

Do databases always lock non-existent rows after a query or update?

Given:

customer[id BIGINT AUTO_INCREMENT PRIMARY KEY, email VARCHAR(30), count INT]

I'd like to execute the following atomically: Update the customer if he already exists; otherwise, insert a new customer.

In theory this sounds like a perfect fit for SQL-MERGE but the database I am using doesn't support MERGE with AUTO_INCREMENT columns.

https://stackoverflow.com/a/1727788/14731 seems to indicate that if you execute a query or update statement against a non-existent row, the database will lock the index thereby preventing concurrent inserts.

Is this behavior guaranteed by the SQL standard? Are there any databases that do not behave this way?

UPDATE: Sorry, I should have mentioned this earlier: the solution must use READ_COMMITTED transaction isolation unless that is impossible in which case I will accept the use of SERIALIZABLE.

Upvotes: 0

Answers (4)

Gili

Reputation: 90043

Answering my own question since there seems to be a lot of confusion around the topic. It seems that:

-- BAD! DO NOT DO THIS! --
insert customer (email, count) 
select '[email protected]', 0
where not exists (
      select 1 from customer
      where email = '[email protected]'
)

is open to race-conditions (see Only inserting a row if it's not already there). From what I've been able to gather, the only portable solution to this problem:

Pick a key to merge against. This could be the primary key, or another unique key, but it must have a unique constraint.
Try to insert a new row. You must catch the error that will occur if the row already exists.
The hard part is over. At this point, the row is guaranteed to exist and you are protected from race-conditions by the fact that you are holding a write-lock on it (due to the insert from the previous step).
Go ahead and update if needed or select its primary key.

Upvotes: 1

James K. Lowden

Reputation: 7837

This question is asked about once a week on SO, and the answers are almost invariably wrong.

Here's the right one.

insert customer (email, count) 
select '[email protected]', 0
where not exists (
      select 1 from customer
      where email = '[email protected]'
)

update customer set count = count + 1
where email = '[email protected]'

If you like, you can insert a count of 1 and skip the update if the inserted rowcount -- however expressed in your DBMS -- returns 1.

The above syntax is absolutely standard and makes no assumption about locking mechanisms or isolation levels. If it doesn't work, your DBMS is broken.

Many people are under the mistaken impression that the select executes "first" and thus introduces a race condition. No: that select is part of the insert. The insert is atomic. There is no race.

Upvotes: 1

usr

Reputation: 171188

Use Russell Fox's code but use SERIALIZABLE isolation. This will take a range lock so that the non-existing row is logically locked (together with all other non-existing rows in the surrounding key range).

So it looks like this:

BEGIN TRAN
IF EXISTS (SELECT 1 FROM foo WITH (UPDLOCK, HOLDLOCK) WHERE [email] = 'thisemail')
BEGIN
    UPDATE foo...
END
ELSE
BEGIN
    INSERT INTO foo...
END
COMMIT

Most code taken from his answer, but fixed to provided mutual exclusion semantics.

Upvotes: 2

Russell Fox

Reputation: 5435

IF EXISTS (SELECT 1 FROM foo WHERE [email] = 'thisemail')
BEGIN
    UPDATE foo...
END
ELSE
BEGIN
    INSERT INTO foo...
END

Upvotes: -1

Do databases always lock non-existent rows after a query or update?

Answers (4)

Related Questions