Reputation: 45350
I have the following simple code to test against collision on a primary key I am creating:
$machine_ids = array();
for($i = 0; $i < 100000; $i++) {
//Generate machine id returns a 15 character alphanumeric string
$mid = Functions::generate_machine_id();
if(in_array($mid, $machine_ids)) {
die("Collision!");
} else {
$machine_ids[] = $mid;
}
}
die("Success!");
Any idea why this is taking many minutes to run? Anyway to speed it up?
Upvotes: 11
Views: 14581
Reputation: 218
If you need best performance for your case, you need store your data as array key and
use isset
or array_key_exists
(since php >= 7.4 array_key_exists
is now as fast as isset
) instead in_array
.
Attention. It is true that isset on a hash map is faster than searching through an array for a value (in_array), but keep in mind that converting an array of values, ["foo", "bar", "baz"], to a hash map, ["foo" => true, "bar" => true, "baz" => true], incurs a memory cost (as well as potentially constructing the hash map, depending on how and when you do it). As with all things, you'll have to weigh the pros & cons for each case to determine if a hash map or array (list) of values works best for your needs. This isn't specific to PHP but more of a general problem space of computer science.
And some performance tests from https://gist.github.com/alcaeus/536156663fac96744eba77b3e133e50a
<?php declare(strict_types = 1);
function testPerformance($name, Closure $closure, $runs = 1000000)
{
$start = microtime(true);
for (; $runs > 0; $runs--)
{
$closure();
}
$end = microtime(true);
printf("Function call %s took %.5f seconds\n", $name, $end - $start);
}
$items = [1111111];
for ($i = 0; $i < 100000; $i++) {
$items[] = rand(0, 1000000);
}
$items = array_unique($items);
shuffle($items);
$assocItems = array_combine($items, array_fill(0, count($items), true));
$in_array = function () use ($items) {
in_array(1111111, $items);
};
$isset = function () use ($assocItems) {
isset($items[1111111]);
};
$array_key_exists = function () use ($assocItems) {
array_key_exists(1111111, $assocItems);
};
testPerformance('in_array', $in_array, 100000);
testPerformance('isset', $isset, 100000);
testPerformance('array_key_exists', $array_key_exists, 100000);
Output:
Function call in_array took 5.01030 seconds
Function call isset took 0.00627 seconds
Function call array_key_exists took 0.00620 seconds
Upvotes: 1
Reputation: 198334
For this, use $mid
as keys, and dummy value as value. Specifically, instead of
if(in_array($mid, $machine_ids)) {
die("Collision!");
} else {
$machine_ids[] = $mid;
}
use
if(isset($machine_ids[$mid])) {
die("Collision!");
} else {
$machine_ids[$mid] = 1;
}
At the end you can extract the array you originally wanted with array_keys($machine_ids)
.
This should be much faster. If it is still slow, then your Functions::generate_machine_id()
is slow.
EDITED to add isset
as per comments.
Upvotes: 12
Reputation: 6406
Refactor your code so that it uses a associated array to hold the machine IDs and use isset
to check
if( isset($machine_id[$mid]) ) die("Collision");
$machine_ids[$mid] = $mid;
Using isset
should be faster
Upvotes: 2
Reputation: 47321
for($i = 0; $i < 100000; $i++)
{
//Generate machine id returns a 15 character alphanumeric string
$mid = Functions::generate_machine_id();
if (isset($machine_ids[$mid]))
{
die("Collision!");
}
$machine_ids[$mid] = true;
}
Upvotes: 14
Reputation: 11866
Checking for array membership is a O(n) operation, since you have to compare the value to every element in the array. After you add a whole bunch of stuff to the array, naturally it gets slower.
If you need to do a whole bunch of membership tests, as is the case here, you should use a different data structure that supports O(1) membership tests, such as a hash.
Upvotes: 3