Analyzer
Analyzer

Reputation: 89

Find Duplicate arrays and Intersection of arrays in array of hash values using Perl

I want to find duplicate Arrays from hash that contains arrays. Point is, I am trying to develop sets and storing them into hash table of Perl. After, I need to extract 1. those arrays which are completely duplicate(Having all values same). 2. Intersection of arrays

Source code is given as under:  


use strict;
use warnings;

my @test1= ("Bob", "Flip", "David");
my @test2= ("Bob", "Kevin", "John", "Michel");
my @test3= ("Bob", "Flip", "David");
my @test4= ("Haidi", "Bob",  "Grook", "Franky");
my @test5= ();
my @test6=();

my %arrayHash= ( "ppl1" => [@test1],
             "ppl2"=> [@test2], 
             "ppl3" => [@test3],
             "ppl4"=> [@test4], 
             "ppl5"=> [@test5],
             "ppl6"=> [@test6],  

            );


Required Output:  ppl1 and   ppl3 have duplicate lists
Intersection of arrays= Bob

Kindly note that duplication of empty arrays is not desired!

Upvotes: 2

Views: 261

Answers (3)

Arunesh Singh
Arunesh Singh

Reputation: 3535

You need to check two arrays for equality for the hash keys.For that you can use smart match operator for comparison.

Next you can use grep to filter-out values which are not duplicates and a hash to keep track of values which are already checked.

#!/usr/bin/perl
use strict;
use warnings;

my @test1= ("Bob", "Flip", "David");
my @test2= ("Kevin", "John", "Michel");
my @test3= ("Bob", "Flip", "David");
my @test4= ("Haidi", "Grook", "Franky");
my @test5= ("Bob", "Flip", "David");
my @test6= ("Kevin", "John", "Michel");
my @test7= ("Haidi", "Grook", "Frank4");


my %arrayHash= ( "ppl1" => [@test1],
                 "ppl2"=> [@test2],
                 "ppl3" => [@test3],
                 "ppl4"=> [@test4],
                 "ppl5"=> [@test5],
                 "ppl6"=> [@test6],
                 "ppl7"=> [@test7]
                );

my %seen;
foreach my $key1 (sort keys %arrayHash){
   next unless @{$arrayHash{$key1}};
   my @keys;
   if(@keys=grep{(@{$arrayHash{$key1}} ~~ @{$arrayHash{$_}} ) && ($_ ne $key1) && (not exists $seen{$key1})}sort keys %arrayHash){
           unshift(@keys,$key1);
           print "@keys are duplicates \n";
           @seen{@keys}=@keys;
      }
}

output:

ppl1 ppl3 ppl5 are duplicates 
ppl2 ppl6 are duplicates

Upvotes: 0

Sobrique
Sobrique

Reputation: 53478

So there's a set of steps here:

  • compare your arrays one to the other. This is harder because you're doing multi-element arrays. You can't directly test equivalence, because you need to compare members.

  • Filter one from the other.

So first of all:

(Edit: Coping with empty)

#!/usr/bin/env perl

use strict;
use warnings;

my @test1 = ( "Bob",   "Flip",  "David" );
my @test2 = ( "Kevin", "John",  "Michel" );
my @test3 = ( "Bob",   "Flip",  "David" );
my @test4 = ( "Haidi", "Grook", "Franky" );
my @test5 = ();
my @test6 = ();

my %arrayHash = (
    "ppl1" => [@test1],
    "ppl2" => [@test2],
    "ppl3" => [@test3],
    "ppl4" => [@test4],
    "ppl5" => [@test5],
    "ppl6" => [@test6],

);

my %seen;

#cycle through the hash
foreach my $key ( sort keys %arrayHash ) {

    #skip empty:
    next unless @{ $arrayHash{$key} };

    #turn your array into a string - ':' separated
    my $value_str = join( ":", sort @{ $arrayHash{$key} } );

    #check if that 'value string' has already been seen
    if ( $seen{$value_str} ) {
        print "$key is a duplicate of $seen{$value_str}\n";
    }
    $seen{$value_str} = $key;
}

Now note - this is a bit of a cheat - it sticks together your arrays with :, which doesn't work in every scenario.

("Bob:", "Flip") and ("Bob", ":Flip") will end up the same.

It will also only print your most recent duplicate if you have multiple.

You can work around this - if you want - by pushing multiple values into the %seen hash.

Upvotes: 1

Sameer Naik
Sameer Naik

Reputation: 1412

use strict;
use warnings;

my @test1= ("Bob", "Flip", "David");
my @test2= ("Kevin", "John", "Michel");
my @test3= ("Bob", "Flip", "David");
my @test4= ("Haidi", "Grook", "Franky");

my %arrayHash= ( "1" => \@test1,
             "2"=> \@test2,
             "3" => \@test3,
             "4"=> \@test4,

            );

sub arrayCmp {
        my @array1 = @{$_[0]};
        my @array2 = @{$_[1]};

        return 0 if ($#array1 != $#array2);

        @array1 = sort(@array1);
        @array2 = sort(@array2);

        for (my $ii = 0; $ii <= $#array1; $ii++) {
                if ($array1[$ii] ne $array2[$ii]) {
                        #print "$array1[$ii] != $array2[$ii]\n";
                        return 0;
                }
        }

        return 1;
}


my @keyArr = sort(keys(%arrayHash));
for(my $i = 0; $i <= $#keyArr - 1; $i++) {

        my @arr1 = @{$arrayHash{$keyArr[$i]}};

        for(my $j = 1; $j <= $#keyArr; $j++) {
                my @arr2 = @{$arrayHash{$keyArr[$j]}};
                if ($keyArr[$i] ne $keyArr[$j] && arrayCmp(\@arr1, \@arr2) == 1) {
                        print "$keyArr[$i] and $keyArr[$j] are duplicates\n";
                }
        }
}

Outputs this

1 and 3 are duplicates

Upvotes: 0

Related Questions