Tom
Tom

Reputation: 63

Find duplicates in array of objects on the basis of specific keys

my goal is to find duplicates in an array of objects, but only for specific object-variables.

Instead of using two foreach-loops like the following, I am searching for a better (more elegant) way to find the duplicates:

foreach ($data as $date) {
      foreach ($data as $innerDate) {
          if ($date->birthday == $innerDate->birthday &&
              $date->street == $innerDate->street &&
              $date->streetnr == $innerDate->streetnr &&
              $date->zipcode == $innerDate->zipcode &&
              $date->twinid == $innerDate->twinid &&
              $date !== $innerDate) {
              // Duple
        }
    }
}

Thanks!


Now, I'm using following code, based on Tarilo's idea:

usort($data, function($obj_a, $obj_b){
      if ($obj_a->birthday == $obj_b->birthday &&
          $obj_a->street == $obj_b->street &&
          $obj_a->streetnr == $obj_b->streetnr &&
          $obj_a->zipcode == $obj_b->zipcode &&
          $obj_a->twinid == $obj_b->twinid) {
          // Duple
      }
});

Looks much better than two foreach-Loops ;-)

Upvotes: 2

Views: 4755

Answers (3)

nicolask
nicolask

Reputation: 1

This one gives you an array with similar items grouped. Should be faster for bigger datasets: O(2n) with additional cost for string concat and count on resulting groups. Just takes a little more memory because of the hashmap.

$hashmap = array();
foreach ($data as $date) {
    $hash = $date->zipcode.'-'.$date->street.'-'.$date->streetnr.'-'.$date->birthday.'-'.$date->twinid;
    if (!array_key_exists($hash, $hashmap)) {
        $hashmap[$hash] = array();
    }
    $hashmap[$hash][] = $date;
}

foreach ($hashmap as $entry) {
    if (count($entry) > 1) {
        foreach ($entry as $date) {
            // $date is a duplicate
        }
    }
}

Upvotes: 0

Husni
Husni

Reputation: 1065

Since $data is an array, we can use array_* function

Try this, works on my end (PHP 5.2.0).

if ($data != array_unique($data)) {
    echo 'oops, this variable has one or more duplicate item(s)'; die;
}

Upvotes: 0

Tarilo
Tarilo

Reputation: 430

You could sort the array first and then loop over the sorted array. This way you only have to compare the current object with the next/previous object. Your current algorithm is O(n^2) efficient but after sorting it would be (sorting + looping) = (O(log n) + O(n)) efficient. Where n is the number of objects in your array.

Upvotes: 3

Related Questions