LeeTee
LeeTee

Reputation: 6601

PHP MySQL running query in batches

I am new to working with large amounts of data. I am wondering if there are any best practices when querying a database in batches or if anyone can give any advice.

I have a query that will pull out all data and PHP is used to write the data to an XML file. There can be anywhere between 10 and 500,000 rows of data and I have therefore witten the script to pull the data out in batches of 50, write to the file, then get the next 50 rows, append this to the file etc. Is this OK or should I be doing something else? Could I increase the batch size or should I decrease it to make the script run faster?

Any advice would be much appreciated.

Upvotes: 5

Views: 17628

Answers (4)

xdzc
xdzc

Reputation: 1471

The best way to go about this is to schedule it as a CRON job, which i think is the best solution for batch processing in PHP. check this link for more info! Batch Processing in PHP. Hope this helps.

Upvotes: 1

ajreal
ajreal

Reputation: 47321

You would be surprised ONE simple select all without limit is the fastest,
because it only query database once,
everything else is processed locally

$sql = select all_columns from table;

<?php
// set a very high memory

// query without limit, if can avoid sorting is the best

// iterate mysql result, and set it to an array
// $results[] = $row
// free mysql_result

// write xml for every one thousand
// because building xml is consuming MOST memory
for ($i=0; $i<$len; ++$i)
{
  $arr = $results[$i];
  // do any xml preparation

  // dun forget file-write is expensive too
  if ($i%1000 == 0 && $i > 0)
  {
    // write to file
  }
}
?>

Upvotes: 0

CLo
CLo

Reputation: 3730

The best way to do this depends on a couple of different things. Most importantly is when and why you are creating this XML file.

If you are creating the XML file on demand, and a user is waiting for the file then you'll need to do some fine tuning and testing for performance.

If it's something that's created on a regular basis, maybe a nightly or hourly task, and then the XML file is requested after it's built (something like an RSS feed builder) then if what you have works I would recommend not messing with it.

As far as performance, there are different things that can help. Put in some simple timers into your scripts and play with the number of records per batch and see if there is any performance differences.

$start = microtime(true);
//process batch
$end = microtime(true);
$runTimeMilliseconds = $end - $start;

If the issue is user feedback, you may consider using AJAX to kick off each batch and report progress to the user. If you give the user feedback, they'll usually be happy to wait longer than if they're just waiting on the page to refresh in whole.

Also, check your SQL query to make sure there's no hidden performance penalties there. http://dev.mysql.com/doc/refman/5.0/en/explain.html EXPLAIN can show you how MySQL goes about processing your queries.

At an extreme, I'd imagine the best performance could be accomplished through parallel processing. I haven't worked with it in PHP, but here's the primary reference http://www.php.net/manual/en/refs.fileprocess.process.php

Depending on your hosting environment you could find the total number of records and split it among sub processes. Each building their own XML fragments. Then you could combine the fragments. So process 1 may handle records 0 to 99, process 2 100 to 199, etc.

Upvotes: 1

rogal111
rogal111

Reputation: 5933

Yes, for huge results it is recommended to use batches (performance and memory reasons).

Here is benchmark and example code of running query in batches

Upvotes: 5

Related Questions