Hailwood
Hailwood

Reputation: 92651

Find intersecting rows between two 2d arrays comparing differently keyed columns

I have two arrays,

The $first has 5000 arrays inside it and looks like:

array(
  array('number' => 1),
  array('number' => 2),
  array('number' => 3),
  array('number' => 4),
  ...
  array('number' => 5000)
);

and the $second has 16000 rows and looks like:

array(
  array('key' => 1, 'val' => 'something'),
  array('key' => 2, 'val' => 'something'),
  array('key' => 3, 'val' => 'something'),
  ...
  array('key' => 16000, 'val' => 'something'),
)

I want to create a third array that contains $second[$i]['val'] IF $second[$i][$key] is in $first[$i][$number]

currently I am doing:

$third = array();
foreach($first as &$f)
  $f = $f['number'];

foreach($second as $s){
  if(in_array($s['key'], $first)
    $third[] = $s['val];
}

but, unless I use php's set_timeout(0) it is timing out, is there a more efficient way?

Upvotes: 1

Views: 131

Answers (3)

mickmackusa
mickmackusa

Reputation: 48000

You are asking for "intersections" between the two arrays but on specific column keys which are not identical. Not to worry, PHP has a native function that is optimized under the hood for this task. array_uintersect() with no special data preparation. Within the custom callback function, null coalesce to the opposite array's key name. The reason for this fallback is because $a and $b do not represent array1 and array2. Because the intersect/diff family of native array functions sort while they filter, there may be instances where column values from the same array will be compared against each other. Again, this is part of the source code optimization.

Code: (Demo)

var_export(
    array_uintersect(
        $keyValues,
        $numbers,
        fn($a, $b) => ($a['number'] ?? $a['key']) <=> ($b['number'] ?? $b['key'])
    )
);

As a general rule, though, if you are going to make a lot of array comparisons and speed matters, it is better to make key-based comparisons instead of value-based comparisons. Because of the way that PHP handles arrays as hashmaps, key-comparison functions/processes always outpace their value-comparing equivalent.

If you need to isolate the val column data after filtering, array_column() will fix this up for you quickly.

Upvotes: 0

Kamil Szot
Kamil Szot

Reputation: 17817

$third = array();
$ftemp = array();
foreach($first as $f)
  $ftemp[$f['number']] = true;

foreach($second as $s){
  if(isset($ftemp[$s['key']]))
    $third[] = $s['val'];
}

should be waaay faster.

Don't try to make lookup dictionary in more convoluted way like below, because it actually is slower than above straightforward loop:

$third = array();
$ftemp = array_flip(reset(call_user_func_array('array_map', array_merge(array(null), $first))));
// $ftemp = array_flip(array_map('reset', $first)); // this is also slower
// array_unshift($first, null); $ftemp = array_flip(reset(call_user_func_array('array_map', $first))); // and this is even slower and modifies $first array

foreach($second as $s){
  if(isset($ftemp[$s['key']]))
    $third[] = $s['val'];
}

Upvotes: 3

ChrisR
ChrisR

Reputation: 14467

It's probably serious nitpicking but you could replace the foreach with a for which is a little faster, but i doubt that will make a big difference. You are working on a big dataset which might simply be not really fast to process on a webserver.

Upvotes: 0

Related Questions