Alladin
Alladin

Reputation: 1062

php array performance when adding elements one by one vs when adding all of the data at once

for the sake of simplicity i will put a simple example, i have an $array and few key and values that i wanna add to this array.. what is better primarily from performance perspective:

  1. to add all of those key values in one statement to the array. or
  2. it won't harm to just do them one by one.

1

$array = [
    $key1 => $value1, 
    $key2 => $value2
];

OR

$array[$key1] = $value1;
$array[$key2] = $value2;

Upvotes: 1

Views: 1224

Answers (2)

Markus AO
Markus AO

Reputation: 4889

If you have a handful of keys/values, it will make absolutely no difference. If you deal in arrays with 100K+ members, it does actually make a difference. Let's build some data first:

$r = [];
for($i = 1; $i <= 100000; $i++) {
    $r[] = $i; // for numerically indexed array
    // $r["k_{$i}"] = $i; // for associative array
    // array_push($r, $i); // with function call
}

This generates an array with 100000 members, one-by-one. When added with a numeric (auto)index, this loop takes ~0.0025 sec on my laptop, memory usage at ~6.8MB. If I use array_push, it takes ~0.0065 sec with the function overhead. When $i is added with a named key, it takes ~0.015 sec, memory usage at ~12.8MB. Then, named keys are slower to define.

But would it make a difference if you shaved 0.015 sec to 0.012 sec? Or with ^10 volume, 0.15 sec to 0.12 sec, or even 0.075 sec? Not really. It would only really start becoming noticeable if you had 1M+ members. What you actually do with that volume of data will take much longer, and should be the primary focus of your optimization efforts.


Update: I prepared three files, one with the 100K integers from above in one set; another with 100K integers separately defined; and serialized as JSON. I loaded them and logged the time. It turns out that there is a difference, where the definition "in one set" is 50% faster and more memory-efficient. Further, if the data is deserialized from JSON, it is 3x faster than including a "native array".

  • "In One Set": 0.075 sec, 9.9MB
  • "As Separate": 0.150 sec, 15.8MB
  • "From JSON": 0.025 sec, 9.9MB
  • "From MySQL": 0.110 sec, 13.8MB*

Then: If you define large arrays in native PHP format, define them in one go, rather than bit-by-bit. If you load bulk array data from a file, json_decode(file_get_contents('data.json'), true) loading JSON is significantly faster than include 'data.php'; with a native PHP array definition. Your mileage may vary with more complex data structures, however I wouldn't expect the basic performance pattern to change. For reference: Source data at BitBucket.

A curious observation: Generating the data from a scratch, in our loop above, was actually much faster than loading/parsing it from a file with a ready-made array!

MySQL: Key-value pairs were fetched from a two-column table with PDO into an array matching the sample data with fetchAll(PDO::FETCH_UNIQUE|PDO::FETCH_COLUMN) .


Best practice: When defining your data, if it's something you need to work with, rather than "crude export/import" data not read or manually edited: Construct your arrays in a manner that makes your code easy to maintain. I personally find it "cleaner" to keep simple arrays "contained":

$data = [
    'length' => 100,
    'width' => 200,
    'foobar' => 'possibly'
];

Sometimes your array needs to "refer to itself" and the "bit-by-bit" format is necessary:

$data['length'] = 100;
$data['width'] = 200;
$data['square'] = $data['length'] * $data['width'];

If you build multidimensional arrays, I find it "cleaner" to separate each "root" dataset:

$data = [];
$data['shapes'] = ['square', 'triangle', 'octagon'];
$data['sizes'] = [100, 200, 300, 400];
$data['colors'] = ['red', 'green', 'blue'];

On a final note, by far the more limiting performance factor with PHP arrays is memory usage (see: array hashtable internals), which is unrelated to how you build your arrays. If you have massive datasets in arrays, make sure you don't keep unnecessary modified copies of them floating around beyond their scope of relevance. Otherwise your memory usage will rocket.


Tested on Win10 / PHP 8.1.1 / MariaDB 10.3.11 @ Thinkpad L380.

Upvotes: 3

davidkonrad
davidkonrad

Reputation: 85538

If it was a suspicious bottleneck (perhaps millions of items?) I would go straight to do a little performance test. See this little script, comparing multiple key updates once array[key = , , ] and multiple separate key updates in consecutive statements array[key] = when opdating 1.000.000 times :

$time_start = microtime(true);
$a1 = array();
for ($i = 0; $i < 1000000; ++$i) {
  $a1 = [
    'key1' => $i, 
    'key2' => $i + 1
  ];
}
$time_end = microtime(true);
printf('Took %f seconds for inline array[key = ]<br>', $time_end - $time_start);
 
$time_start = microtime(true);
$a2 = array();
for ($i = 0; $i < 1000000; ++$i) {
  $a2['key1'] = $i;
  $a2['key2'] = $i + 1;
}
$time_end = microtime(true);
printf('Took %f seconds for array[key] = <br>', $time_end - $time_start);

That gives me (the picture is more or less the same on each run) :

Took 0.195255 seconds for inline array[key = ]
Took 0.204276 seconds for array[key] =

So, it really doesn't matter - no noticeable difference you have to worry about - but updating multiple keys in one statement seems to be a little bit faster. Most of the times, not guaranteed always.

And that is also exactly what we could expect! Think about it logically: Updating the array keys in one statement is slightly more efficient than updating the same array keys by multiple consecutive statements, simply because the array in memory are accessed fewer times.

Upvotes: 1

Related Questions