Reputation: 6818
I have 1 cronjob that runs every 60 minutes but for some reason, recently, it is running slow.
Env: centos5 + apache2 + mysql5.5 + php 5.3.3 / raid 10/10k HDD / 16gig ram / 4 xeon processor
Here's what the cronjob do:
parse the last 60 minutes data
a) 1 process parse user agent and save the data to the database
b) 1 process parse impressions/clicks on the website and save them to the database
from the data in step 1
a) build a small report and send emails to the administrator/bussiness
b) save the report into a daily table (available in the admin section)
I see now 8 processes (the same file) when I run the command ps auxf | grep process_stats_hourly.php
(found this command in stackoverflow)
Technically I should only have 1 not 8.
Is there any tool in Cent OS or something I can do to make sure my cronjob will run every hour and not overlapping the next one?
Thanks
Upvotes: 33
Views: 1428
Reputation: 58681
Is there any tool in Cent OS or something I can do to make sure my cronjob will run every hour and not overlapping the next one?
Yes. CentOS' standard util-linux
package provides a command-line convenience for filesystem locking. As Digital Precision suggested, a lockfile is an easy way to synchronize processes.
Try invoking your cronjob as follows:
flock -n /var/tmp/stats.lock process_stats_hourly.php || logger -p cron.err 'Unable to lock stats.lock'
You'll need to edit paths and adjust for $PATH as appropriate. That invocation will attempt to lock stats.lock
, spawning your stats script if successful, otherwise giving up and logging the failure.
Alternatively your script could call PHP's flock() itself to achieve the same effect, but the flock(1)
utility is already there for you.
Upvotes: 2
Reputation: 1
You might use strace -p 1234
where 1234 is a relevant process id, on one of the processes which is running too long. Perhaps you'll understand why is it so slow, or even blocked.
Upvotes: 2
Reputation: 3740
The ps
's output also shows when the process have started (see column STARTED
).
$ ps auxf
USER PID %CPU %MEM VSZ RSS TTY STAT STARTED TIME COMMAND
root 2 0.0 0.0 0 0 ? S 18:55 0:00 [ktrheadd]
^^^^^^^
(...)
Or you can customize the output:
$ ps axfo start,command
STARTED COMMAND
18:55 [ktrheadd]
(...)
Thus, you can be sure if they are overlapping.
Upvotes: 6
Reputation: 19999
You should use a lockfile mechanism within your process_stats_hourly.php script. Doesn't have to be anything overly complex, you could have php write the PID which started the process to a file like /var/mydir/process_stats_hourly.txt. So if it takes longer than an hour to process the stats and cron kicks off another instance of the process_stats_hourly.php script, it can check to see if the lockfile already exists, if it does it will not run.
However you are left with the problem of how to "re-queue" the hourly script if it did find the lock file and couldn't start.
Upvotes: 4
Reputation: 34031
How often is that logfile rotated?
A log-parsing job suddenly taking longer than usual sounds like the log isn't being rotated and is now too big for the parser to handle efficiently.
Try resetting the logfile and see if the job runs faster. If that solves the problem, I recommend logrotate as a means of preventing the problem in the future.
Upvotes: 1
Reputation: 49885
Your hardware seems to be good enough to process this.
1) Check if you already have hanging processes. Using the ps auxf
(see tcurvelo answer), check if you have one or more processes that takes too much resources. Maybe you don't have enough resources to run your cronjob.
2) Check your network connections: If your databases and your cronjob are on a different server you should check whats the response time between these two machines. Maybe you have network issues that makes the cronjob wait for the network to send the package back.
You can use: Netcat, Iperf, mtr or ttcp
3) Server configuration Is your server is configured correctly? Your OS, MySQL are setup correctly? I would recommend to read these articles:
http://www3.wiredgorilla.com/content/view/220/53/
http://www.vr.org/knowledgebase/1002/Optimize-and-disable-default-CentOS-services.html
http://dev.mysql.com/doc/refman/5.1/en/starting-server.html
http://www.linux-mag.com/id/7473/
4) Check your database: Make sure your database has the correct indexes and make sure your queries are optimized. Read this article about the explain command
If a query with few hundreds thousands of record takes times to execute that will affect the rest of your cronjob, if you have a query inside a loop, even worse.
Read these articles:
http://dev.mysql.com/doc/refman/5.0/en/optimization.html
http://20bits.com/articles/10-tips-for-optimizing-mysql-queries-that-dont-suck/
http://blog.fedecarg.com/2008/06/12/10-great-articles-for-optimizing-mysql-queries/
5) Trace and optimized PHP code? Make sure your PHP code runs as fast as possible.
Read these articles:
http://phplens.com/lens/php-book/optimizing-debugging-php.php
http://code.google.com/speed/articles/optimizing-php.html
http://ilia.ws/archives/12-PHP-Optimization-Tricks.html
A good technique to validate your cronjob is to trace your cronjob script: Based on your cronjob process, put some debug trace including how much memory, how much time it took to execute the last process. eg:
<?php
echo "\n-------------- DEBUG --------------\n";
echo "memory (start): " . memory_get_usage(TRUE) . "\n";
$startTime = microtime(TRUE);
// some process
$end = microtime(TRUE);
echo "\n-------------- DEBUG --------------\n";
echo "memory after some process: " . memory_get_usage(TRUE) . "\n";
echo "executed time: " . ($end-$start) . "\n";
By doing that you can easily find which process takes how much memory and how long it takes to execute it.
6) External servers/web service calls Is your cronjob calls external servers or web service? if so, make sure these are loaded as fast as possible. If you request data from a third-party server and this server takes few seconds to return an answer that will affect the speed of your cronjob specially if these calls are in loops.
Try that and let me know what you find.
Upvotes: 46
Reputation: 1191
You could add a step to the cronjob to check the output of your above command:
ps auxf | grep process_stats_hourly.php
Keep looping until the command returns nothing, indicating that the process isn't running, then allow the remaining code to execute.
Upvotes: -1