Reputation: 4985
I would like to run a PHP script as a cronjob every night. The PHP script will import a XML file with about 145.000 products. Each product contains a link to an image which will be downloaded and saved on the server as well. I can imagine that this may cause some overload. So my question is: is it a better idea to split the PHP file? And if so, what would be a better solution? More cronjobs, with several minutes pause between each other? Run another PHP file using exec (guess not, cause I can't imagine that would make much of a difference), or someting else...? Or just use one script to import all products at once?
Thanks in advance.
Upvotes: 1
Views: 747
Reputation: 1090
To complete the previous answer, the best solution is to optimize your scripts:
Upvotes: 1
Reputation: 2384
It depends a lot on how you've written it in terms of whether it doesn't leak open files or database connections. It also depends on which version of php you're using. In php 5.3 there was a lot done to address garbage collection:
http://www.php.net/manual/en/features.gc.performance-considerations.php
If it's not important that the operation is transactional, i.e all or nothing (for example, if it fails half way through) then I would be tempted to tackle this in chunks where each run of the script processed the next x items, where x can be a variable depending on how long it takes. So what you'll need to do then is keep on repeating the script until nothing is done.
To do this, I'd recommend using a tool called the Fat Controller:
http://fat-controller.sourceforge.net
It can keep on repeating the script and then stop once everything is done. You can tell the Fat Controller that there's more to do, or that everything is done using exit statuses from the php script. There are some use cases on the Fat Controller website, for example: http://fat-controller.sourceforge.net/use-cases.html#generating-newsletters
You can also use the Fat Controller to run processes in parallel to speed things up, just be careful you don't run too many in parallel and slow things down. If you're writing to a database, then ultimately you'll be limited by the hard disc, which unless you have something fancy will mean your optimum concurrency will be 1.
The final question would be how to trigger this - and you're probably best off triggering the Fat Controller from CRON.
There's plenty of documentation and examples on the Fat Controller website, but if you need any specific guidance then I'd be happy to help.
Upvotes: 1