joren
joren

Reputation: 1155

Generating an XML file with large amounts of data and avoid memory cap

I have a Wordpress site with about 300k pages in it now, and a server with 1GB of memory. Unfortunately, all of the sitemap generating plugins can no longer handle it. I've tried 3 different methods of writing to XML with PHP (XMLWriter, SimpleXMLElement, and DOMDocument) and they all end up capping out at around 30k pages (xml nodes).

What do you think I could do to make this work? Worst case scenario, I thought about setting up multiple cron jobs that would run every ten minutes once a day and keep opening/appending to the file and just add to it in chunks, but that's obviously not an optimal solution. I found some snippet that claimed to be able to flush out the memory during my loop, but it didn't do the trick either. Here's an example of that snippet:

$xml = '<?xml version="1.0" encoding="UTF-8"?><urlset xmlns="http://www.sitemaps.org/schemas/sitemap/0.9" xmlns:xsi="http://www.w3.org/2001/XMLSchema-instance" xsi:schemaLocation="http://www.sitemaps.org/schemas/sitemap/0.9 http://www.sitemaps.org/schemas/sitemap/0.9/sitemap.xsd">';
for ($i = 0 ; $i< 20; $i++) {
  $query = mysql_query(sprintf("SELECT ID, post_date FROM wp_posts WHERE post_status='publish' LIMIT %s,%s", $i*10000, 10000));
  while ($row = mysql_fetch_array($query)) {
    $xml .= '<url>';
    $xml .= '<loc>'.get_permalink($row['ID']).'</loc>';
    $xml .= '<lastmod>'.$row['post_date'].'</lastmod>';
    $xml .= '<changefreq>weekly</changefreq>';
    $xml .= '<priority>0.6</priority>';
    $xml .= '</url>';
  }
}
$xml .= '</urlset>';

$sxe = new SimpleXMLElement($xml);
$sxe->asXML("sitemap.xml");

Upvotes: 1

Views: 2462

Answers (1)

E_p
E_p

Reputation: 3144

Why are you grabbing all the records at once?

Try get 10000 rows per request. And clean up memory after every iteration.

If you run in cli mode older versions of php do not release memory so you can try to fork http://php.net/manual/en/function.pcntl-fork.php

How to do it:

  1. No need to use any xml library, sprintf would do the trick.
  2. Wrap it in to for ($i = 0, $i < 5, $i++) {}
  3. Query would look like LIMIT ($i*10000) 10000

Code Example:

for ($i = 0 ; $i< 5 $i++) {
    ...
    $sth = $dbh->prepare('SELECT * FROM table_name LIMIT ? ?');
    $sth->execute(array($i*10000, 10000));
    ...
}

Another code example:

    <?php
    $fileHandle = fopen("sitemap.xml", "w");

    fwrite($fileHandle,
        '<?xml version="1.0" encoding="UTF-8"?>' . 
        '<urlset xmlns="http://www.sitemaps.org/schemas/sitemap/0.9"' .
        ' xmlns:xsi="http://www.w3.org/2001/XMLSchema-instance" ' .
        ' xsi:schemaLocation="http://www.sitemaps.org/schemas/sitemap/0.9' .
        ' http://www.sitemaps.org/schemas/sitemap/0.9/sitemap.xsd">'
    );

    for ($i = 0 ; $i< 20; $i++) {
        $query = mysql_query(sprintf("SELECT ID, post_date FROM wp_posts WHERE post_status='publish' LIMIT %s,%s", $i*10000, 10000));
        $xml = '';
        while ($row = mysql_fetch_array($query)) {
            $xml .= '<url>'.
                '<loc>'.get_permalink($row['ID']).'</loc>' .
                '<lastmod>'.$row['post_date'].'</lastmod>' .
                '<changefreq>weekly</changefreq>' .
                '<priority>0.6</priority>' .
                '</url>';
        }

        fwrite($fileHandle, $xml);
    }
    fwrite($fileHandle, '</urlset>');

    fclose($fileHandle);

    echo PHP_EOL . 'memory used: '. memory_get_peak_usage() . PHP_EOL;

Upvotes: 3

Related Questions