Tim Yao
Tim Yao

Reputation: 1147

What is the best way to process large data in PHP

I have a daily cron job which will get a XML from web service. Sometimes it is large, contains more than 10K products information and the XML size will be 14M example.

What I need to do is parsing XML to object then processing them. The processing is quite complicated. Not like directly put them into the database, I need to do a lot operation on them, and finally put them into many database tables.

It is just in one PHP script. I don't have any experience on dealing with large data.

So the problem is it take a lot of memory. And very long time to do it. I turn my localhost PHP memory_limit to 4G and running 3.5hrs then got successful. But my production host is not allowed such amount memory.

I do a research but I am very confused which is a right way to dealing with this situation.

Here is a sample of my code:

function my_items_import($xml){

    $results = new SimpleXMLElement($xml);
    $results->registerXPathNamespace('i', 'http://schemas.microsoft.com/dynamics/2008/01/documents/Item');

    //it will loop over 10K
    foreach($results->xpath('//i:Item') as $data) {

        $data->registerXPathNamespace('i', 'http://schemas.microsoft.com/dynamics/2008/01/documents/Item');

        //my processing code here, it will call a other functions to do a lot things
        processing($data);

    }
    unset($results);
}

Upvotes: 0

Views: 2384

Answers (2)

Zac
Zac

Reputation: 1072

Key hints:

  1. dispose data during process.
    • Dispose data - mean over write it with blank data. BTW, unset is slower than overwrite with null
  2. Use functions or static method, avoid as much oop instance as possible.

One extra question, how long it takes to loop your xml without do [lots things]:

function my_items_import($xml){

    $results = new SimpleXMLElement($xml);
    $results->registerXPathNamespace('i', 'http://schemas.microsoft.com/dynamics/2008/01/documents/Item');

    //it will loop over 10K
    foreach($results->xpath('//i:Item') as $data) {

        $data->registerXPathNamespace('i', 'http://schemas.microsoft.com/dynamics/2008/01/documents/Item');

        //my processing code here, it will call a other functions to do a lot things
        //processing($data);

    }
    //unset($result);// no need
}

Upvotes: 1

Alexander Nenkov
Alexander Nenkov

Reputation: 2910

As a start don't use SimpleXMLElement on the whole document. SimpleXMLElement loads everything in the memory and is not efficient for large data. Here is a snippet from a real code. You'll need to accommodate it to your case but hope you'll get the general idea.

    $reader = new XMLReader();
    $reader->xml($xml);
    // Get cursor to first article
    while($reader->read() && $reader->name !== 'article');

    // Iterate articles
    while($reader->name === 'article')
    {
        $doc = new DOMDocument('1.0', 'UTF-8');
        $article = simplexml_import_dom($doc->importNode($reader->expand(), true));
        processing($article);
        $reader->next('article');
    }
    $reader->close();

$article is SimpleXMLElement which can be processed further. This way you save a lot of memory by making only single article nodes go into memory. Additionally if each processing() function take long time you can turn it into a background process which runs in separately from the main script and several processing() functions can be started in parallel.

Upvotes: 3

Related Questions