Gerardo
Gerardo

Reputation: 1948

Best practices for running a PHP cronjob continuously

I need to run some tasks continuously. These tasks consist, mainly, of retrieving specific records from the DB, analyzing and saving them. This a non-trivial analysis, which might take several seconds (more than a minute, perhaps). I do not know how frequently will new records be saved in the DB waiting for analysis (there's another cronjob for that).

Should I retrieve records one by one calling the same analysis function again once it finishes (recursively) and try to keep the cronjob running until there are no more unanalyzed records? Or should I retrieve a fixed amount of new records on each cronjob run and call the cronjob every certain amount of minutes?

Upvotes: 0

Views: 2964

Answers (3)

SlappyTheFish
SlappyTheFish

Reputation: 2384

Instead of using a cron job, I would use The Fat Controller to run and repeat tasks. It is basically a daemon which can run any script or application and restart it after it finishes, optionally with a delay between runs.

You can additionally specify a timeout so that long-running scripts will be stopped. This way you don't need to care about locking, long-running processes, error process and so on. It will help to keep your business logic clean.

There's more examples and use cases on the website:

http://fat-controller.sourceforge.net/

Upvotes: 0

yannis
yannis

Reputation: 6335

Or should I retrieve a fixed amount of new records on each cronjob run and call the cronjob every certain amount of minutes?

That. And you'll have to do some trial and error metrics first to decide an optimal fixed amount.

Of course it heavily depends on what you are actually doing, how many db intensive cron jobs you are running simultaneously and what kind of setup you have. I recently spent a day looking for a Heisenbug in a very intensive script that migrated images from db to s3 (and created a few thumbs while migrating). The problem was that due to an undocumented behaviour in our ORM the connection to the database was lost at some point, as posting to s3 + thumbs generation for certain images took a little bit more than the connection time limit. It was an ugly situation, that would probably cost more than a day to identify in a recursive do it all scheme.

You'd be better off with the safe approach, even if it means a little time lost between cron executions.

Upvotes: 1

drew010
drew010

Reputation: 69927

A job queue server may work well for this scenario (See ActiveMQ or MemcacheQ for example. Rather than adding the un-analyzed records directly to the database, send them to a queue for processing. Then your cron job could retrieve some items from the queue for processing, and if one job takes so long to run the cron job is triggered again, the next one will run and grab the next items in the queue.

Personally, I would have the cron job retrieve a fixed number of records for processing, just to make sure you don't get the script stuck processing for a very long time in the event new records keep getting added and the processor can't keep up. Eventually it would probably finish everything but you could end up in a situation where it continues for a very long time.

You may consider creating a lock file as well that the job can look for to see if the task processor is already running. For example when the cron job starts, check for the existence of a file (e.g. processor.lock), if it exists, exit, if not, create the file, process some records, and delete the file.

Hope that helps.

Upvotes: 6

Related Questions