ian
ian

Reputation: 313

Keep array rows where a column value is found in a second flat array

I have an array, $arr1 with 5 columns as such:

 key    id  name    style   age whim
 0      14  bob     big     33  no
 1      72  jill    big     22  yes
 2      39  sue     yes     111 yes
 3      994 lucy    small   23  no
 4      15  sis     med     24  no
 5      16  maj     med     87  yes
 6      879 Ike     larg    56  no
 7      286 Jed     big     23  yes

This array is in a cache, not a database.

I then have a second array with a list of id values -

$arr2 = array(0=>14, 1=>72, 2=>8790)

How do I filter $arr1 so it returns only the rows with the id values in $arr2?

I have tried to use filter function (below), array_search, and several others but cannot figure out how to make it work.

$resultingArray = [];  // create an empty array to hold rows
$filter_function = function ($row) use ($arr2) {
    foreach ($arr2 as $arr) {
        return ($row['id'] == $arr);
    }
}

Upvotes: 3

Views: 629

Answers (6)

TylerH
TylerH

Reputation: 21080

Migrating OP's solution from the question to an answer:

I got my code to work as follows:

$arr1 = new CachedStuff();  // get cache

$resultingArray = [];  // create an empty array to hold rows
$filter_function = function ($row) use ($arr2) {
   return (array_search($row['id'], $arr2));
};
$resultingArrayIDs = $arr1->GetIds($filter_function, $resultingArray);

This gives me two outputs: $resultingArray & $resultingArrayIDs, both of which represent the intersection of $arr1 and $arr2.

Upvotes: 0

mickmackusa
mickmackusa

Reputation: 48001

This whole task can be accomplished with just one slick, native function call -- array_uintersect().

Because the two compared parameters in the custom callback may come either input array, try to access from the id column and if there isn't one declared, then fallback to the parameter's value.

Under the hood, this function performs sorting while evaluating as a means to improve execution time / processing speed. I expect this approach to outperform iterated calls of in_array() purely from a point of minimized function calls.

Code: (Demo)

var_export(
    array_uintersect(
        $arr1,
        $arr2,
        fn($a, $b) =>
            ($a['id'] ?? $a)
            <=>
            ($b['id'] ?? $b)
    )
);

Upvotes: 4

Markus AO
Markus AO

Reputation: 4889

This answer was migrated from a deleted duplicate. Revised to make sense independent of context.

Assume the following sample data (named $items and $select instead of $arr1 and $arr2 for clarity):

// Source data: A multidimensional array with named keys
$items = [
    ['id' => 1, 'name' => 'Foo'],
    ['id' => 3, 'name' => 'Bar'],
    ['id' => 5, 'name' => 'Maz'],
    ['id' => 6, 'name' => 'Wut'],
];

// Filter values: A flat array of scalar values
$select = [1, 5, 6];

Then, how do we extract $items with an id that matches one of the values in $select? And further, how do we do that in a manner that scales gracefully for larger datasets? Let's look at the possibilities and compare their weights.

1. Optimizing array_filter():

The answer using array_filter certainly gets the job done. However, there's an in_array function call made at each iteration. With small datasets, this is hardly an issue. With larger datasets, repeated function calls in an iteration can result in a significant performance hit. Then, for large loops, where possible it's good to "preprocess" data for a lighter operation that uses language constructs in place of the more expensive function calls.

How to avoid in_array() in loops?

You can "enable" simple index lookups with array_flip($select), ie. by swapping keys and values, and then using isset (language construct, not a function!): isset($select[$id]). This performs significantly better than repetitions of in_array($id, $select) for larger datasets; not only for lack of function call, but at each iteration, in_array scans over the $select array for matches (over and over). Optimized as follows:

$select = array_flip($select);
$selected_items = array_filter($items, function($item) use ($select) {
    return isset($select[$item['id']]);
});

Or using an arrow function that includes the parent scope, ie. doesn't need the use statement:

$select = array_flip($select);
$selected_items = array_filter($items, fn($item) => isset($select[$item['id']]));

2. Using Key Intersection

One elegant alternative to filtering is key intersection. First, we re-index the array by the desired lookup key using array_column(), with null for column key (returns full array instead of a specific column), and with id for the new index key:

$items_by_id = array_column($items, null, 'id');

This gives you the same source array, but instead of being zero-indexed, it now uses the id column's value for the index key. Then, we're an array_intersect_key away from extracting the selection from the source array:

$selected_items = array_intersect_key($items_by_id, array_flip($select));

Here we flip the $select to intersect keys. Note that array_intersect_key performs better than approaches using array_intersect. (Keys are simple!) Result as expected. See demo of this approach. Finally, here's a one-liner (formatted for easy reading) without the throw-away variable:

$selected_items = array_intersect_key(
    array_column($items, null, 'id'), 
    array_flip($select)
);

N.B. The resulting array will retain the actual id of the item for its index key; instead of the default zero-indexed keys. Keep that in mind if you cross-reference the selected items with your source array later on in your code; and perhaps index items by the proper ID from the beginning.


Comparing these approaches:

array_filter() incurs 1 iteration of $items with 1 (anonymous) function call per each array member; and then as many iterations of $select as there are items, if in_array is used to compare the current item's ID with each $select member. (Use key lookups instead.)

The answer using array_search in a foreach loop suffers from the same weight, resulting in count($items) times function calls --- and a whole lot of redundant rounds over the selection/filter array.

The array_key_intersect method 1. iterates over $items once (simple reindexing); 2. iterates over $select once (key/value flip); and 3. iterates over the keys of each for an intersection. array_intersect_key sorts both lists and then compares them in parallel, and as such is much more efficient than repeated array scans for each value. (This function exists specifically for getting intersections, ie. finding overlaps, after all.)


3. Good Old Foreach Loop

Of course a good old foreach loop will also work perfectly fine. Again, using array_flip() and isset() index lookups, rather than in_array() or array_search(). As follows:

$select = array_flip($select);

$selected_items = [];
foreach($items as $key => $val) {
    if (isset($select[$val['id']])) {
        $selected_items[] = $items[$key];
    }
}

I'd instinctively use this for large datasets (or long comparison lists) where "bare bones" performance is called for, going by "simpler is better". However, you likely won't see a big difference between this and the key intersection approach without massive data to process. (If someone has compared these methods for PHP 8.x, please share the benchmark results.)

Upvotes: 1

Decent Dabbler
Decent Dabbler

Reputation: 22773

Something like this should do it, provided I've understood your question and data structure correctly:

$dataArray = [
  [ 'key' => 0, 'id' => 14  , 'name' => 'bob'  , 'style' => 'big'   , 'age' => 33  , 'whim' => 'no'  ],
  [ 'key' => 1, 'id' => 72  , 'name' => 'jill' , 'style' => 'big'   , 'age' => 22  , 'whim' => 'yes' ],
  [ 'key' => 2, 'id' => 39  , 'name' => 'sue'  , 'style' => 'yes'   , 'age' => 111 , 'whim' => 'yes' ],
  [ 'key' => 3, 'id' => 994 , 'name' => 'lucy' , 'style' => 'small' , 'age' => 23  , 'whim' => 'no'  ],
  [ 'key' => 4, 'id' => 15  , 'name' => 'sis'  , 'style' => 'med'   , 'age' => 24  , 'whim' => 'no'  ],
  [ 'key' => 5, 'id' => 16  , 'name' => 'maj'  , 'style' => 'med'   , 'age' => 87  , 'whim' => 'yes' ],
  [ 'key' => 6, 'id' => 879 , 'name' => 'Ike'  , 'style' => 'larg'  , 'age' => 56  , 'whim' => 'no'  ],
  [ 'key' => 7, 'id' => 286 , 'name' => 'Jed'  , 'style' => 'big'   , 'age' => 23  , 'whim' => 'yes' ]
];

$filterArray = [14, 72, 879];
$resultArray = array_filter( $dataArray, function( $row ) use ( $filterArray ) {
  return in_array( $row[ 'id' ], $filterArray );
} );

View this example on eval.in


However, your question appears to suggest this data might be coming from a database; is that correct? If so, perhaps it's more efficient to pre-filter the results at the database-level. Either by adding a field in the SELECT query, that represents a boolean value whether a row matched your filter ids, or by simply not returning the other rows at all.

Upvotes: 2

user2182349
user2182349

Reputation: 9782

As @DecentDabbler mentioned - if the data is coming out of a database, using an IN on your WHERE will allow you to retrieve only the relevant data.

Another way to filter is to use array functions

  • array_column extracts the value of the id column into an array
  • array_intersect returns the elements which are in both $arr1['id'] and $arr2
  • array_flip flips the resulting array such that the indices into $arr1 indicate the elements in both $arr1 and $arr2

    $arr1 = [ [ 'id' => 14, 'name' => 'bob'],
            ['id' =>  72, 'name' => 'jill'],
            ['id' =>  39, 'name' => 'sue'],
            ['id' => 994, 'name' => 'lucy'],
            ['id' => 879, 'name'=> 'large']];
    
    $arr2 = [ 14,72,879 ];
    
    $intersection = array_flip(array_intersect(array_column($arr1,'id'),$arr2));
    
    foreach ($intersection as $i) {
            var_dump($arr1[$i]);;
    }
    

Upvotes: 1

Goma
Goma

Reputation: 1981

One way is with foreach loop with array_search()

$result = [];
foreach ($arr1 as $value) {                            // Loop thru $arr1
    if (array_search($value['id'], $arr2) !== false) { // Check if id is in $arr2
        $result[] = $value;                            // Push to result if true
    }
}

// print result
print_r($result);

Upvotes: 1

Related Questions