rationalboss
rationalboss

Reputation: 5389

PHP scripts in cron jobs are double processing

I have 5 cron jobs running a PHP file. The PHP file checks the MySQL database for items that require processing. Since cron launches the scripts all at the same time, it seems that some of the items are processed twice, or even sometimes up to five times.

Upon SELECting the file in one of the scripts, it immediately sends an UPDATE query so that other jobs shouldn't run it again. But looks like it's still double processing.

What can I do to prevent the other scripts from processing an item that was previously selected by the other cron jobs?

Upvotes: 1

Views: 890

Answers (3)

juanra
juanra

Reputation: 1642

I think you have a typical problem to use semaphores. Take a look at this article:

http://www.re-cycledair.com/php-dark-arts-semaphores

The idea would be at first of each script, ask for the same semaphore and wait until it be free. Then SELECT and UPDATE the DB as you do it, free the semaphore and start the process. This is the only way you can be sure that no more than one script is reading the DB while another one is about to write on it.

Upvotes: 2

klkvsk
klkvsk

Reputation: 670

This issue is called "race condition". In this case it happens due to SELECT and UPDATE, though called one after another, are not a single operation. Therefore, there is a chance that two jobs do SELECT the same job, then first does UPDATE, and then second does UPDATE. And so they proceed to run this job simultaneously.

There is a workaround, however. You could add a field to your table containing ID of current cron job worker (if you run it all on one machine, it may be PID). In worker you do UPDATE first, trying to reserve a job for it:

UPDATE jobs 
    SET worker = $PID, status = 'processing' 
    WHERE worker IS NULL AND status = 'awaiting' LIMIT 1

Then you verify you successfully reserved a job for this worker:

SELECT * FROM jobs WHERE worker = $PID

If it did not return you a row, it means other worker was first to reserve it. You can try again from step 1 to aquire another job. If it did return a row, you do all your processing, and then final UPDATE in the end:

UPDATE jobs 
    SET status = 'done', worker = NULL
    WHERE id = $JOB_ID

Upvotes: 4

Jake N
Jake N

Reputation: 10583

I would start again. This train of thought:

it takes time to process one item. about 30 seconds. if i have five cron jobs, five items are processed in 30 seconds

This is just plain wrong and you should not write your code with this in mind.

By that logic why not make 100 cron jobs and do 100 per 30 seconds? Answer, because your server is not RoadRunner and it will fall over and fail.

You should

  1. Rethink your problem, this is the most important as it will help with 1 and 2.
  2. Optimise your code so that it does not take 30 seconds.
  3. Segment your code so that each job is only doing one task at a time which will make it quicker and also ensure that you do not get this 'double processing' effect.

EDIT

Even with the new knowledge of this being on a third party server my logic still stands, do not start multiple calls that you are not in control of, in fact this is now even more important.

If you do not know what they are doing with the calls then you cannot be sure they are in the right order, when or if they are processed. So just make one call to ensure you do not get double processing.

A technical solution would be for them to improve the processing time or for you to cache the responses - but that may not be relevant to your situation.

Upvotes: 0

Related Questions