jmcampbell
jmcampbell

Reputation: 388

PHP program fails with large amounts of data

I am writing a PHP web program that retrieves data from a file and graphs it. It works fine if I run it in the command line on the server, and it works from a browser for relatively small amounts of data, but once the file gets to 1.2 or 1.3 million lines (30 char/line, so 3.5 or 4 MB), I get a HTTP error 500. The odd thing is, it works inconsistently; with 1.25 million lines, it sometimes works and sometimes doesn't. Here is the code:

<?php

$wait = $_GET["wait"];
$measure = $_GET["measure"];
$graphsize = $_GET["graphsize"];
$title = "Current";
if ($measure == "CURR") $title = "Current (A)";
if ($measure == "VOLT") $title = "Voltage (V)";
if ($measure == "RES")  $title = "Resistance (Ohms)";

$page = $_SERVER["PHP_SELF"];

$data = array(array("Time", $title));
$datasize = filesize("data.csv")/30;
$x = 0;

$file = fopen("data.csv", "r");
while (($datum = fgets($file)) !== False) {
    $x++;
    if ($x % ($datasize/$graphsize) == 0) {
        $datum = explode(",", $datum);
        $datum[0] = floatval($datum[0]);
        $datum[1] = floatval($datum[1]);
        $data[] = $datum;
    }
}
fclose($file);

if (count($data) == 1) $data[] = array(0,0);

?>

graphing stuff down here, I'm pretty sure this isn't the problem

Upvotes: 1

Views: 856

Answers (1)

S. Imp
S. Imp

Reputation: 2895

On some systems, PHP has two distinct php.ini files -- one for apache and a different one for CLI. Usually, the CLI ini file doesn't put any limit on max_execution_time and also has a large value for memory_limit. This would probably explain why it runs via CLI but not via web server.

You are wise to parse the file line by line as this will consume less memory than reading in the entire file contents at once. You should check the result of fopen to make sure you are in fact opening the file:

$file = fopen("data.csv", "r");
if (!$file) {
    throw new Exception("Could not open data file");
}

If your script is returning a 5XX result when accessed via web server, this usually means that the PHP script encountered a fatal error condition. I'm guessing you are either a) running out of time or b) running out of memory. To find out, you'll need to look at the php error. If it's not being output directly to your browser, then you'll need to figure out where our php log is. There may be a value specified for this or there may not. Try this to see if a value is set:

echo ini_get("error_log");

If this value is empty, then:

If this directive is not set, errors are sent to the SAPI error logger. For example, it is an error log in Apache or stderr in CLI.

On my Ubuntu machines, this file is set in the apache conf file for each domain like so:

ErrorLog /var/www/site_name/log/error.log

But it might be something totally different on your machine. If you can't find it, consider using the set_error_handler function to create your own custom error handling function that can intercept the error and write it to a file or email it or just spit it out or something.

It would be informative to check what limits your php.ini has set for a couple of settings:

// feel free to add more ini settings to this array if you are curious
$to_check = array("max_execution_time", "memory_limit", "error_log");
foreach($to_check as $setting) {
    echo $setting . ": " . ini_get($setting) . "\n<br>";
}

If these values look unsatisfactory, you may have some luck trying to change these values in your script itself using ini_set or by editing the php.ini for your web server. I wouldn't recommend the latter, though, as the values therein are set to protect your server -- if you allow scripts to run too long or consume too much memory, your server is vulnerable to scripts running that could consume all its resources. If, however, your server is running in safe_mode, then you are not allowed to change settings with ini_set.

I'd also suggest that you take a look at the php functions memory_get_usage() and microtime(). You can track memory used and elapsed time in your script to get some idea of what values are reached before the script fails. While it's probably easier to just echo these values from your script that would mean a lot of output which is probably not a good idea. I suggest you write the values returned to a file. Something like:

$log_file = "/some/path/to/log/file.txt";
$start_time = microtime(TRUE); // returns a unix timestamp as a float
file_put_contents($log_file, "start time is " . $start_time)
    or die("Unable to write log file");

// your script blah blah blah
$file = fopen("data.csv", "r");
while (($datum = fgets($file)) !== False) {

    // do your script datum stuff blah blah blah

    // write our progress to our log file
    file_put_contents($log_file, "elapsed time is " . (microtime(TRUE) - $start_time))
        or die("Unable to write elapsed time to log file");
    file_put_contents($log_file, "memory consumed is " . memory_get_usage())
        or die("Unable to write memory usage to log file");
}

If your script fails, then you can go look at the contents of /some/path/to/log/file.txt and see how much time and memory was used before it stopped.

Upvotes: 1

Related Questions