Simon
Simon

Reputation: 1

Large PHP 5.4 Script gets slower

I'm using a php script for updating product data. While the consumed memory is constant, the consumed time per 1.000 products is increasing all the time:

[26000 - 439.75 MB / 14.822s]..........
[27000 - 439.25 MB / 15.774s]..........
[28000 - 438.25 MB / 15.068s]..........
[29000 - 437.75 MB / 16.317s]..........
[30000 - 437.25 MB / 16.968s]..........
[31000 - 436.25 MB / 17.521s]....

Even if i disable everything except reading a line of my variable containing the CSV data, the effect is the same, except a lower increase rate:

[65000 - 424.75 MB / 0.001s]..........
[66000 - 424.75 MB / 0.63s]..........
[67000 - 424.75 MB / 0.716s]..........
[68000 - 424.75 MB / 0.848s]..........
[69000 - 424.75 MB / 0.943s]..........
[70000 - 424.25 MB / 1.126s]..........
[71000 - 423.5 MB / 1.312s]....

I tried changing the GC settings (php -dzend.enable_gc=1 and php -dzend.enable_gc=0).

I load my data in advance with:

$this->file = file($file_path);

The next line is retrieved with:

$line = array_shift($this->file);

I don't know why this should consistantly increase the required time, especially when I just array_shift the line without performing any actions on it.

My current solution is to split the file up in 10.000 pieces, which is not a desirable solution for a file that contains more than 300.000 lines and has to be updated every day.

It would be nice to at least understand what happens here...

Thanks in advance for any hints.

Upvotes: 0

Views: 155

Answers (3)

Mark Baker
Mark Baker

Reputation: 212412

The issue with array_shift()

Part of the data maintained internally for every single element in an array is a sequence number identifying the position of that element within the array. These values are effectively sequential integers, starting from 0 for the first element. Don't confuse this with the key value of an enumerated array, it's maintained purely internally, and completely separate to the key so that you can do associative sorts, which effectively just re-organize these internal position values.

When you add a new element to an array, it needs to be given a new sequence value. If you're just adding the new element to the end of the array, then it's as simple as taking the previous higest sequence value, adding one, and assigning that as the sequence value for the new element.... a simple O(1) activity. Likewise, if you remove the last element, it can simply be removed, and the sequence for all other elements remains valid.

However, if you add a new element to the beginning of the array using array_unshift(), then it will be assigned the 0 value, and every existing element already in the array will need to have its sequence value increased by 1, so PHP internally has to traverse every element making this an O(n) transaction. Likewise array_shift() has to decreement the sequence value for every remaining array element once it has removed the first element from the array, also O(n). If your array is very large, this can be a major overhead.

General performance

In answer to your performance issues.... why are you reading the entire file into memory in one go? Why can't you simply process it one line at a time?

$fh = fopen('filename.txt', 'r');
while (!feof($fh)) {
    $item = fread($fh);
    .... processing here
}
fclose($fh);

And don't try to out-think PHP's garbage collection

Upvotes: 3

John Reid
John Reid

Reputation: 1185

Is there a specific reason why you need to use array_shift()?

Maybe just reading the file and closing it would make your script run faster:

$this->file = file($file_path);
foreach ($this->file as $line) {
  // do the thing you need to do
}
unset ($this->file);

Another thing is that you seem to be reading one array ($file) and turning it into another ($line). Maybe it might be worth using the $file array as it is?

I'm not sure exactly what you're doing - but hopefully these suggestions might help.

Upvotes: 0

sneexz
sneexz

Reputation: 265

array_shift() should technically run faster the more it is used, as it has to re-index a smaller set.

Are you doing anything else with the returned result?

Alternatively, you may think about reversing the array before the loop:

$reversed = array_reverse($file);

And then popping the last value inside your loop

$item = array_pop($reversed);

Upvotes: 0

Related Questions