dognose
dognose

Reputation: 20909

Insert with Subselect - Atomic Operation?

I know, that mysql supports auto-increment values, but no dependent auto-increment values.

i.e. if you have a table like this:

id | element | innerId
1  | a       | 1
2  | a       | 2
3  | b       | 1

And you insert another b-element, you need to compute the innerId on your own, (Excpected insert would be "2")

What would be the best way to achieve this behaviour? I do not know the number of elements, so i cannot create dedicated tables for them, where I just could derrive an id.

(The example is simplyfied)

The target that should be achieved, is that any element "type" (where the number is unknown, possibly infitine -1 should have it's own, gap-less id.

If I would use something like

INSERT INTO 
  myTable t1 
   (id,element, innerId)
   VALUES
   (null, 'b', (SELECT COUNT(*) FROM myTable t2 WHERE t2.element = "b") +1)

http://sqlfiddle.com/#!2/2f4543/1

Will this return the expected result under all circumstances? I mean it works, but what about concurrency? Are Inserts with SubSelects still atomic or might there be a szenario, where two inserts will try to insert the same id? (Especially if a transactional insert is pending?)

Would it be better to try to achieve this with the programming language (i.e. Java)? Or is it easier to implement this logic as close to the database engine as possible?

Since I'm using an aggregation to compute the next innerId, i think using SELECT...FOR UPDATE can not avoid the problem in case of other transactions having pending commits, right?

ps.: I could ofc. just bruteforce the insert - starting at the current max value per element - with a unique key constraint on (element,innerId) until there is no foreignKey-violation - but isn't there a nicer way?

According to Make one ID with auto_increment depending on another ID - possible? it would be possible with a composite primary key on - in my case - innerId and element. But according to this setting MySQL auto_increment to be dependent on two other primary keys that works only for MyIsam (I have InnoDB)


Now i'm confused even more. I tried to use 2 different php scripts to insert data, using the query above. While script one has a "sleep" for 15 seconds in order to allow me to call script two (which should simulate the concurrent modification) - The result was correct when using one query.

(ps.: mysql(?!i)-functions only for quick debugging)

Base Data:

enter image description here

Script 1:

mysql_query("START TRANSACTION");
mysql_query("INSERT INTO insertTest (id, element, innerId, fromPage)VALUES(null, 'a', (SELECT MAX(t2.innerID) FROM insertTest t2 WHERE element='a') +1, 'page1')");

sleep(15);

//mysql_query("ROLLBACK;");
mysql_query("COMMIT;");

Script 2:

//mysql_query("START TRANSACTION");
mysql_query("INSERT INTO insertTest (id, element, innerId, fromPage)VALUES(null, 'a', (SELECT MAX(t2.innerID) FROM insertTest t2 WHERE element='a') +1, 'page2')");
//mysql_query("COMMIT;");

I would have expected that the page2 insert would have happened before the page1 insert, cause it's running without any transaction. But in fact, the page1 insert happened FIRST, causing the second script to also be delayed for about 15 seconds...

(ignore the AC-Id, played around a bit)

enter image description here

When using Rollback on the first script, the second script is still delayed for 15 seconds, and then picking up the correct innerId:

enter image description here

So:

Using the selection and insert in seperate, non-transactional statements like this (on page 2, simulating the concurrent modification):

$nextId = mysql_query("SELECT MAX(t2.innerID) as a FROM insertTest t2 WHERE element='a'");
$nextId = mysql_fetch_array($nextId);
$nextId = $nextId["a"] +1;
mysql_query("INSERT INTO insertTest (id, element, innerId, fromPage)VALUES(null, 'a', $nextId, 'page2')");

leads to the error I was trying to avoid:

enter image description here

so why does It work in the concurrent szenario when each modification is one query? Are inserts with subselects atomic?

Upvotes: 2

Views: 1173

Answers (2)

Oncaphillis
Oncaphillis

Reputation: 1908

From what I read here: Atomicity multiple MySQL subqueries in an INSERT/UPDATE query? your query seems to be atomic. I've tested it on my MySQL with InnoDB with 4 different programs trying to execute the query 100000 times each. After that I was able to create a combined Unique key on (element,innerid) and it worked well, so it didn't seem to generate a duplicate. However I've got:

Deadlock found when trying to get lock

So you might want to consider this http://dev.mysql.com/doc/refman/5.1/en/innodb-deadlocks.html

EDIT: It seems I could circumvent the deadlock by changing the SQL to

INSERT INTO test (id,element,innerId) VALUES(null, "b", (SELECT Count(*) FROM test t2 WHERE element = 'b' FOR UPDATE )+1);

Upvotes: 1

Gordon Linoff
Gordon Linoff

Reputation: 1270391

Well, all (or almost) all databases support the necessary functionality for calculating innerid according to your rules. It is called a trigger, specifically a before insert trigger.

Your particular version will not work consistently in a multi-user environment. Few, if any, databases generate read locks on a table when starting an insert. That means that two insert statements issued very close together would generate the same value for innerid.

Because of concurrency considerations, you should do this calculation in the database, using triggers rather than on the application side.

You always have the possibility of calculating innerid when you need it, rather than when you insert the value. This is computationally expensive, requiring either an order by (using variables) or a correlated subquery. Other databases support window/analytic functions, making such a calculation much easier to express.

Upvotes: 1

Related Questions