Reputation: 5389
I have 5 cron jobs running a PHP file. The PHP file checks the MySQL database for items that require processing. Since cron launches the scripts all at the same time, it seems that some of the items are processed twice, or even sometimes up to five times.
Upon SELECting the file in one of the scripts, it immediately sends an UPDATE query so that other jobs shouldn't run it again. But looks like it's still double processing.
What can I do to prevent the other scripts from processing an item that was previously selected by the other cron jobs?
Upvotes: 1
Views: 890
Reputation: 1642
I think you have a typical problem to use semaphores. Take a look at this article:
http://www.re-cycledair.com/php-dark-arts-semaphores
The idea would be at first of each script, ask for the same semaphore and wait until it be free. Then SELECT and UPDATE the DB as you do it, free the semaphore and start the process. This is the only way you can be sure that no more than one script is reading the DB while another one is about to write on it.
Upvotes: 2
Reputation: 670
This issue is called "race condition". In this case it happens due to SELECT and UPDATE, though called one after another, are not a single operation. Therefore, there is a chance that two jobs do SELECT the same job, then first does UPDATE, and then second does UPDATE. And so they proceed to run this job simultaneously.
There is a workaround, however. You could add a field to your table containing ID of current cron job worker (if you run it all on one machine, it may be PID). In worker you do UPDATE first, trying to reserve a job for it:
UPDATE jobs
SET worker = $PID, status = 'processing'
WHERE worker IS NULL AND status = 'awaiting' LIMIT 1
Then you verify you successfully reserved a job for this worker:
SELECT * FROM jobs WHERE worker = $PID
If it did not return you a row, it means other worker was first to reserve it. You can try again from step 1 to aquire another job. If it did return a row, you do all your processing, and then final UPDATE in the end:
UPDATE jobs
SET status = 'done', worker = NULL
WHERE id = $JOB_ID
Upvotes: 4
Reputation: 10583
I would start again. This train of thought:
it takes time to process one item. about 30 seconds. if i have five cron jobs, five items are processed in 30 seconds
This is just plain wrong and you should not write your code with this in mind.
By that logic why not make 100 cron jobs and do 100 per 30 seconds? Answer, because your server is not RoadRunner and it will fall over and fail.
You should
EDIT
Even with the new knowledge of this being on a third party server my logic still stands, do not start multiple calls that you are not in control of, in fact this is now even more important.
If you do not know what they are doing with the calls then you cannot be sure they are in the right order, when or if they are processed. So just make one call to ensure you do not get double processing.
A technical solution would be for them to improve the processing time or for you to cache the responses - but that may not be relevant to your situation.
Upvotes: 0