Reputation: 13522
I have a process that selects the next item to process from a MySQL InnoDB Table based on some criteria. When a row has been selected as the next to process, it's processing
field is set to 1 while processing is happening outside the database. I do this so that many processors can be run at once, and they won't process the same row.
If I use transactions to execute the following queries, are they guaranteed to be executed together ( eg. Without any other MySQL connections executing queries. )? If they are not, then multiple processors could get the same id
from the SELECT query and then processing will be redundant.
Pseudo Code Example
Prepare Transaction...
$id = SELECT id
FROM companies
WHERE processing = 0
ORDER BY last_crawled ASC
LIMIT 1;
UPDATE companies
SET processing = 1
WHERE id = $id;
Execute Transaction
I've been struggling to accomplish this fast enough using a single UPDATE query ( see this question ). Assume that is not an option for the purposes of this question.
Upvotes: 2
Views: 2149
Reputation: 562438
You still have a possibility of a race condition, even though you execute the SELECT followed by the UPDATE in a single transaction. SELECT by itself does not lock anything, so you could have two concurrent sessions both SELECT and get the same id. Then both would attempt to UPDATE, but only one would "win" - the other would have to wait.
To get around this, use the SELECT...FOR UPDATE clause, which creates a lock on the rows it returns.
Prepare Transaction...
$id = SELECT id
FROM companies
WHERE processing = 0
ORDER BY last_crawled ASC
LIMIT 1
FOR UPDATE;
This means that the lock is created as the row is selected. This is atomic, which means no other session can sneak in and get a lock on the same row. If they try, their transaction will block on the SELECT.
UPDATE companies
SET processing = 1
WHERE id = $id;
Commit Transaction
I changed your "execute transaction" pseudocode to "commit transaction." Statements within a transaction execute immediately, which means they create locks and so on. Then when you COMMIT, the locks are released and any changes are committed. Committed means they can't be rolled back, and they are visible to other transactions.
Here's a quick example of using mysqli to accomplish this:
$mysqli = new mysqli(...);
$mysqli->report_mode = MYSQLI_REPORT_STRICT; /* throw exception on error */
$mysqli->begin_transaction();
$sql = "SELECT id
FROM companies
WHERE processing = 0
ORDER BY last_crawled ASC
LIMIT 1
FOR UPDATE";
$result = $mysqli->query($sql);
while ($row = $result->fetch_array(MYSQLI_ASSOC)) {
$id = $row["id"];
}
$sql = "UPDATE companies
SET processing = 1
WHERE id = ?";
$stmt = $mysqli->prepare($sql);
$stmt->bind_param("i", $id);
$stmt->execute();
$mysqli->commit();
Re your comment:
I tried an experiment and created a table companies
, filled it with 512 rows, then started a transaction and issues the SELECT...FOR UPDATE
statement above. I did this in the mysql client, no need to write PHP code.
Then, before committing my transaction, I examined the locks reported:
mysql> show engine innodb status\G
=====================================
2013-12-04 16:01:28 7f6a00117700 INNODB MONITOR OUTPUT
=====================================
...
---TRANSACTION 30012, ACTIVE 2 sec
2 lock struct(s), heap size 376, 513 row lock(s)
...
Despite using LIMIT 1
, this report shows transaction appears to lock every row in the table (plus 1, for some reason).
So you're right, if you have hundreds of requests per second, it's likely that the transactions are queuing up. You should be able to verify this by watching SHOW PROCESSLIST
and seeing many processes stuck in a state of Locked
(i.e. waiting for access to rows that another thread has locked).
If you have hundreds of requests per second, you may have outgrown the ability for an RDBMS to function as a fake message queue. This isn't what an RDBMS is good at.
There are a variety of scalable message queue frameworks with good integration with PHP, like RabbitMQ, STOMP, AMQP, Gearman, Beanstalk.
Check out http://www.slideshare.net/mwillbanks/message-queues-a-primer-international-php-conference-fall-2012
Upvotes: 4
Reputation: 6167
That depends. There are (in general) differet isolation levels in SQL. In MySQL you can change which one to use using SET TRANSACTION ISOLATION LEVEL
.
While "SERIALIZABLE" (which is the strictest one) still doesn't imply that no other actions are executed in between the ones from your transaction, it DOES make sure that there is no difference if simultanious transactions are executed one after another or not - if it would make a difference, on transaction is rolled back and executed later.
Note however that the stricter the isolation is, the more locking and rollbacks has to be done. So makre sure you really need that before using it.
Upvotes: 0