SlamJammington
SlamJammington

Reputation: 62

Identifying and collecting potential duplicates in multidimensional array

I've got an array like the following one that I want to identify duplicates in;

$names = array(
    array("Name" => "John Smith",      "ID" => 65), 
    array("Name" => "Richard Johnson", "ID" => 96), 
    array("Name" => "John Smith",      "ID" => 1105),
    ...
)

There's a lot of similar questions, but most just involve returning a true/false value if there's a duplicate, or simply remove the duplicates. I apologize if there's a question that's identical to this one with a result, I looked but could not find something that would work for my situation.

I just want to identify a pair of arrays that contain the same "Name" values, but possibly different "ID" values. I understand that this may cause duplicate results with inverted values, but I think I can figure that out on my own. I do NOT want to remove the duplicate values, I only want to identify them.

Ideally it would return an array similar to the following (or something similar);

$dupes = array(
    array(
        array("Name" => John Smith, "ID" => 65),
        array("Name" => John Smith, "ID" => 1105)
    )
)

which I could then process into a more refined and user friendly array.

I was thinking of using a recursive in_array function, or possibly a second working array. Any ideas?

Upvotes: 1

Views: 310

Answers (2)

Rob Ruchte
Rob Ruchte

Reputation: 3707

For this type of situation, you would typically want to define some logic for creating a hash from the values in your records in order to determine equality. Once you have that defined, you can use simple looping and associative arrays to keep track of which records have duplicates.

<?php
/**
 * Define an algorithm for equality between records.
 *
 * @param $record
 * @return string
 */
function generateHashForUserRecord($record)
{
    return sha1($record['Name']);
}

$names = [
    ['Name' => 'John Smith', 'ID' => 65],
    ['Name' => 'Richard Johnson', 'ID' => 96],
    ['Name' => 'John Smith', 'ID' => 1105]
];

// This map will be an populated with all records, keyed by hash
$hashBuffer = [];

// Buffer for hashes that are associated with more than one record
$duplicateHashes = [];

// This will be populated with the duplicate records
$duplicateRecords = [];

// Iterate through all of the records
foreach($names as $currRecord)
{
    // Generate a has for the record
    $currHash = generateHashForUserRecord($currRecord);

    // If the hash is not in the hashtable yet, create an array to hold entries with this hash
    if(!array_key_exists($currHash, $hashBuffer))
    {
        $hashBuffer[$currHash] = [];
    }
    else // If this hash is already in the buffer, we have a duplicate - add it to the  $duplicateHashes array
    {
        $duplicateHashes[$hash] = $currHash;
    }

    // Add the record to the hash buffer
    $hashBuffer[$currHash][] = $currRecord;
}

foreach($duplicateHashes as $currDuplicateHash)
{
    $duplicateRecords = array_merge($duplicateRecords, $hashBuffer[$currDuplicateHash]);
}

print_r($duplicateRecords);

That's a lot of ugly procedural code, so it may be good idea to encapsulate it in some sort of helper class.

<?php

$names = [
    ['Name' => 'John Smith', 'ID' => 65],
    ['Name' => 'Richard Johnson', 'ID' => 96],
    ['Name' => 'John Smith', 'ID' => 1105]
];

$duplicateRecords = UserRecordHelper::getDuplicateRecords($names);

print_r($duplicateRecords);

class UserRecordHelper
{
    public static function getDuplicateRecords($records)
    {
        // This map will be an populated with all records, keyed by hash
        $hashBuffer = [];

        // Buffer for hashes that are associated with more than one record
        $duplicateHashes = [];

        // This will be populated with the duplicate records
        $duplicateRecords = [];


        // Iterate through all of the records
        foreach ($records as $currRecord)
        {
            // Generate a has for the record
            $currHash = self::generateHashForUserRecord($currRecord);

            // If the hash is not in the hashtable yet, create an array to hold entries with this hash
            if (!array_key_exists($currHash, $hashBuffer))
            {
                $hashBuffer[$currHash] = [];
            }
            else // If this hash is already in the buffer, we have a duplicate - add it to the  $duplicateHashes array
            {
                $duplicateHashes[$hash] = $currHash;
            }

            // Add the record to the hash buffer
            $hashBuffer[$currHash][] = $currRecord;
        }

        foreach ($duplicateHashes as $currDuplicateHash)
        {
            $duplicateRecords = array_merge($duplicateRecords, $hashBuffer[$currDuplicateHash]);
        }

        return $duplicateRecords;
    }

    public static function generateHashForUserRecord($record)
    {
        return sha1($record['Name']);
    }
}

Upvotes: 3

futureweb
futureweb

Reputation: 442

why don't you just loop through and create a new array using the name as the key? test the below here: http://phptester.net/

$names = array(
    array("Name" => 'John Smith', "ID" => 65), 
    array("Name" => 'Richard Johnson', "ID" => 96), 
    array("Name" => 'John Smith', "ID" => 1105)
);

$users = [];
foreach($names as $usersArray){
    
    $users[$usersArray['Name']]['ids'][] = $usersArray['ID'];
    
}

print_r($users);

or simply:

$names = array(
    array("Name" => 'John Smith', "ID" => 65), 
    array("Name" => 'Richard Johnson', "ID" => 96), 
    array("Name" => 'John Smith', "ID" => 1105)
);

$users = [];
foreach($names as $usersArray){
    
    $users[$usersArray['Name']][] = $usersArray['ID'];
    
}

print_r($users);

Upvotes: 3

Related Questions