user2029890
user2029890

Reputation: 2713

PHP looping through huge text file is very slow, can you improve?

The data contained in the text file (actually a .dat) looks like:

LIN*1234*UP*abcde*33*0*EA
LIN*5678*UP*fghij*33*0*EA
LIN*9101*UP*klmno*33*23*EA

There are actually over 500,000 such lines in the file.

This is what I'm using now:

//retrieve file once        
$file = file_get_contents('/data.dat'); 
$file = explode('LIN', $file);

    ...some code

foreach ($list as $item) { //an array containing 10 items
     foreach($file as $line) { //checking if these items are on huge list
         $info = explode('*', $line);
         if ($line[3] == $item[0]) {
             ...do stuff...                     
             break; //stop checking if found
          }
      }         
 }

The problem is it runs way too slow - about 1.5 seconds of each iteration. I separately confirmed that it is not the '...do stuff...' that is impacting speed. Rather, its the search for the correct item.

How can I speed this up? Thank you.

Upvotes: 2

Views: 2408

Answers (3)

na-98
na-98

Reputation: 889

When you do file_get_contents, it loads the stuff into the memory so you can only imagine how resource intensive the process may be. Not to mention you have a nested loop, that's (O)n^2

You can either split the file if possible or use fopen, fgets and fclose to read them line by line.

If I was you, I’d use another language like C++ or Go if I really need the speeds.

Upvotes: 0

Ja͢ck
Ja͢ck

Reputation: 173642

If each item is on its own line, instead of loading the whole thing in memory, it might be better to use fgets() instead:

$f = fopen('text.txt', 'rt');

while (!feof($f)) {
    $line = rtrim(fgets($f), "\r\n");
    $info = explode('*', $line);
    // etc.
}

fclose($f);

PHP file streams are buffered (~8kB), so it should be decent in terms of performance.

The other piece of logic can be rewritten like this (instead of iterating the file multiple times):

if (in_array($info[3], $items)) // look up $info[3] inside the array of 10 things

Or, if $items is suitably indexed:

if (isset($items[$info[3]])) { ... }

Upvotes: 3

Giacomo1968
Giacomo1968

Reputation: 26076

file_get_contents loads the whole file into memory as an array & then your code acts on it. Adapting this sample code from the official PHP fgets documentation should work better:

$handle = @fopen("test.txt", "r");
if ($handle) {
    while (($buffer = fgets($handle, 4096)) !== false) {
        $file_data = explode('LIN', $buffer);
        foreach($file_data as $line) {
            $info = explode('*', $line);
            $info = array_filter($info);
            if (!empty($info)) {
                echo '<pre>';
                print_r($info);
                echo '</pre>';
            }
        }         
    }
    if (!feof($handle)) {
        echo "Error: unexpected fgets() fail\n";
    }
    fclose($handle);
}

The output of the above code using your data is:

Array
(
    [1] => 1234
    [2] => UP
    [3] => abcde
    [4] => 33
    [6] => EA

)
Array
(
    [1] => 5678
    [2] => UP
    [3] => fghij
    [4] => 33
    [6] => EA

)
Array
(
    [1] => 9101
    [2] => UP
    [3] => klmno
    [4] => 33
    [5] => 23
    [6] => EA
)

But still unclear about your missing code since the line that states:

foreach ($list as $item) { //an array containing 10 items

That seems to be another real choke point.

Upvotes: 0

Related Questions