Reputation: 6276
I have set up 4 CRON jobs to automatically reindex my Sphinx indexes as below:
*/5 * * * /usr/bin/pgrep indexer || time /usr/local/sphinx/bin/indexer --rotate --config /usr/local/sphinx/etc/sphinx.conf ripples_delta
*/5 * * * /usr/bin/pgrep indexer || time /usr/local/sphinx/bin/indexer --rotate --config /usr/local/sphinx/etc/sphinx.conf users_delta
30 23 * * * /usr/bin/pgrep indexer || time /usr/local/sphinx/bin/indexer --config /usr/local/sphinx/etc/sphinx.conf --merge users users_delta --merge-dst-range deleted 0 0 --rotate
0 0 * * * /usr/bin/pgrep indexer || time /usr/local/sphinx/bin/indexer --config /usr/local/sphinx/etc/sphinx.conf --merge ripples ripples_delta --merge-dst-range deleted 0 0 --rotate
The above shows pgrep, which I hope is being used in every instance to check to see if indexer is already running. My intention here is to prevent any potentially resource hungry overlaps.
The first two Cron jobs run every 5 minutes and update the Delta indexes for my two main indexes.
The second two run once per day (one at 11:30pm and the other at 12am) and merge the delta indexes into their main counterparts.
My understanding is that following these index merges I need to re-run the index on the delta's in order to remove all of the previously merged data and essentially clean them up ready for the next day's indexing.
How can I ensure that this happens automatically upon completion of the merges? Obviously I could just add two more cron jobs but I need them to take place immediately after the relevant merge has finished.
Thanks in advance.
Upvotes: 0
Views: 1912
Reputation: 847
For any periodical task I would suggest to create a lock file on the beginning of the script to avoid re-entrance and check if it's exists on the script start.
Script wrapper sample (could be used for periodical MySQL backups as well) is here: http://astellar.com/2012/10/backups-running-at-the-same-time/
Upvotes: 1
Reputation: 21091
Perhaps an even better way is to create a small 'indexing' daemon.
eg
<?php
while (1) {
if (filemtime('path_to_/ripples.sph') < time()-(24*3600)) {
`indexer --rotate ripples_delta`;
sleep(10);
`indexer --merge ripples ripples_delta --rotate`;
mysql_query("UPDATE sph_counter ... ");
`indexer --rotate ripples_delta`;
} elseif (filemtime('path_to_/users.sph') < time()-(24*3600)) {
`indexer --rotate users_delta`;
sleep(10);
`indexer --merge users users_delta --rotate`;
mysql_query("UPDATE sph_counter ... ");
`indexer --rotate users_delta`;
} else {
`indexer --rotate ripples_delta users_delta`;
}
sleep(5*60);
clearstatcache();
}
This way, you just leave this script running indefinitly (I've used screen
for this. But a more robust solution is something like monit).
It will make sure that only ever one process is running at a time. Take care of all the actions. And if the indexing takes longer then it just maintains a gap of 5 minutes.
To be really clever could run a mysql query, to check if the rippes or user tables have updates. And dont even bother running indexer if not.
Upvotes: 1
Reputation: 21091
Another related issue, you should do
*/6 ... indexer --rotate users_delta ripples_delta
ie update both in one command. Then indexer buildes both indexes, then performs the rotation.
With two parallel processes, the two rotations could end up stepping on each other.
(also with the pgrep, it also means the second of the two delta updates are unlikly to first, the first will have always just been started)
Also change to say
34 23 * ...
ie rather than "30", which will mean happening exactly the same time as the delta. And the delta is liky to have already started, meaning will never get the merges.
Upvotes: 2
Reputation: 21091
Create a small shell script that
Being a shell script ensures they run in sequence.
Technically could also miss off 1) as the other */5 will have always recently run anyway.
You also need to run a script to run step 3) anyway. Sphinx cant do that for you. http://sphinxsearch.com/bugs/view.php?id=517
Upvotes: 1