Reputation: 1147
I have a daily cron job which will get a XML from web service. Sometimes it is large, contains more than 10K products information and the XML size will be 14M example.
What I need to do is parsing XML to object then processing them. The processing is quite complicated. Not like directly put them into the database, I need to do a lot operation on them, and finally put them into many database tables.
It is just in one PHP script. I don't have any experience on dealing with large data.
So the problem is it take a lot of memory. And very long time to do it. I turn my localhost PHP memory_limit to 4G and running 3.5hrs then got successful. But my production host is not allowed such amount memory.
I do a research but I am very confused which is a right way to dealing with this situation.
Here is a sample of my code:
function my_items_import($xml){
$results = new SimpleXMLElement($xml);
$results->registerXPathNamespace('i', 'http://schemas.microsoft.com/dynamics/2008/01/documents/Item');
//it will loop over 10K
foreach($results->xpath('//i:Item') as $data) {
$data->registerXPathNamespace('i', 'http://schemas.microsoft.com/dynamics/2008/01/documents/Item');
//my processing code here, it will call a other functions to do a lot things
processing($data);
}
unset($results);
}
Upvotes: 0
Views: 2384
Reputation: 1072
Key hints:
One extra question, how long it takes to loop your xml without do [lots things]:
function my_items_import($xml){
$results = new SimpleXMLElement($xml);
$results->registerXPathNamespace('i', 'http://schemas.microsoft.com/dynamics/2008/01/documents/Item');
//it will loop over 10K
foreach($results->xpath('//i:Item') as $data) {
$data->registerXPathNamespace('i', 'http://schemas.microsoft.com/dynamics/2008/01/documents/Item');
//my processing code here, it will call a other functions to do a lot things
//processing($data);
}
//unset($result);// no need
}
Upvotes: 1
Reputation: 2910
As a start don't use SimpleXMLElement on the whole document. SimpleXMLElement loads everything in the memory and is not efficient for large data. Here is a snippet from a real code. You'll need to accommodate it to your case but hope you'll get the general idea.
$reader = new XMLReader();
$reader->xml($xml);
// Get cursor to first article
while($reader->read() && $reader->name !== 'article');
// Iterate articles
while($reader->name === 'article')
{
$doc = new DOMDocument('1.0', 'UTF-8');
$article = simplexml_import_dom($doc->importNode($reader->expand(), true));
processing($article);
$reader->next('article');
}
$reader->close();
$article is SimpleXMLElement which can be processed further. This way you save a lot of memory by making only single article nodes go into memory. Additionally if each processing() function take long time you can turn it into a background process which runs in separately from the main script and several processing() functions can be started in parallel.
Upvotes: 3