Justin
Justin

Reputation: 45350

PHP in_array() horrible performance. Fatest way to search array for value

I have the following simple code to test against collision on a primary key I am creating:

$machine_ids = array();

for($i = 0; $i < 100000; $i++) {
    //Generate machine id returns a 15 character alphanumeric string
    $mid = Functions::generate_machine_id();

    if(in_array($mid, $machine_ids)) {
        die("Collision!");
    } else {
        $machine_ids[] = $mid;  
    }
}

die("Success!");

Any idea why this is taking many minutes to run? Anyway to speed it up?

Upvotes: 11

Views: 14581

Answers (5)

Arsen
Arsen

Reputation: 218

If you need best performance for your case, you need store your data as array key and use isset or array_key_exists(since php >= 7.4 array_key_exists is now as fast as isset) instead in_array.

Attention. It is true that isset on a hash map is faster than searching through an array for a value (in_array), but keep in mind that converting an array of values, ["foo", "bar", "baz"], to a hash map, ["foo" => true, "bar" => true, "baz" => true], incurs a memory cost (as well as potentially constructing the hash map, depending on how and when you do it). As with all things, you'll have to weigh the pros & cons for each case to determine if a hash map or array (list) of values works best for your needs. This isn't specific to PHP but more of a general problem space of computer science.

And some performance tests from https://gist.github.com/alcaeus/536156663fac96744eba77b3e133e50a

<?php declare(strict_types = 1);

function testPerformance($name, Closure $closure, $runs = 1000000)
{
    $start = microtime(true);
    for (; $runs > 0; $runs--)
    {
        $closure();
    }
    $end = microtime(true);

    printf("Function call %s took %.5f seconds\n", $name, $end - $start);
}

$items = [1111111];
for ($i = 0; $i < 100000; $i++) {
    $items[] = rand(0, 1000000);
}
$items = array_unique($items);
shuffle($items);

$assocItems = array_combine($items, array_fill(0, count($items), true));

$in_array = function () use ($items) {
    in_array(1111111, $items);
};

$isset = function () use ($assocItems) {
    isset($items[1111111]);
};

$array_key_exists = function () use ($assocItems) {
    array_key_exists(1111111, $assocItems);
};

testPerformance('in_array', $in_array, 100000);
testPerformance('isset', $isset, 100000);
testPerformance('array_key_exists', $array_key_exists, 100000);

Output:

Function call in_array took 5.01030 seconds
Function call isset took 0.00627 seconds
Function call array_key_exists took 0.00620 seconds

Upvotes: 1

Amadan
Amadan

Reputation: 198334

For this, use $mid as keys, and dummy value as value. Specifically, instead of

if(in_array($mid, $machine_ids)) {
    die("Collision!");
} else {
    $machine_ids[] = $mid;  
}

use

if(isset($machine_ids[$mid])) {
    die("Collision!");
} else {
    $machine_ids[$mid] = 1;  
}

At the end you can extract the array you originally wanted with array_keys($machine_ids).

This should be much faster. If it is still slow, then your Functions::generate_machine_id() is slow.

EDITED to add isset as per comments.

Upvotes: 12

Jay Sidri
Jay Sidri

Reputation: 6406

Refactor your code so that it uses a associated array to hold the machine IDs and use isset to check

if( isset($machine_id[$mid]) ) die("Collision");

$machine_ids[$mid] = $mid;

Using isset should be faster

Upvotes: 2

ajreal
ajreal

Reputation: 47321

for($i = 0; $i < 100000; $i++) 
{
  //Generate machine id returns a 15 character alphanumeric string
  $mid = Functions::generate_machine_id();
  if (isset($machine_ids[$mid]))
  {
    die("Collision!");
  }
  $machine_ids[$mid] = true;
}

Upvotes: 14

verdesmarald
verdesmarald

Reputation: 11866

Checking for array membership is a O(n) operation, since you have to compare the value to every element in the array. After you add a whole bunch of stuff to the array, naturally it gets slower.

If you need to do a whole bunch of membership tests, as is the case here, you should use a different data structure that supports O(1) membership tests, such as a hash.

Upvotes: 3

Related Questions