Sipra Moon
Sipra Moon

Reputation: 81

Perl Mismatch among arrays

I have two arrays:

@array1 = (A,B,C,D,E,F);
@array2 = (A,C,H,D,E,G);

The arrays could be of different size. I want to find how many mismatches are there between the arrays. The indexes should be the same. In this case there are three mismatch :b->c,c->h and F->G.(i.e , The 'C' in $array[2] should not be considered a match to 'C' in $array[1]) I would like to get the number of mismatches as well as the mismatch.

foreach my $a1 ( 0 .. $#array1) {
 foreach my $a2( 0 .. $#array2)
  if($array1[$a1] ne $array2[$a2]) {

   }
 }
}

my %array_one = map {$_, 1} @array1;
my @difference = grep {!$array_one {$_}} @array1;

print "@difference\n";

Ans: gives me H, G but not C.

with my little Perl knowledge I tried this, with no result. Could you suggest me how I should deal this? Your suggestions and pointers would be very helpful.

Upvotes: 1

Views: 638

Answers (6)

Brad Gilbert
Brad Gilbert

Reputation: 34120

Here's an example using each_arrayref from List::MoreUtils.

sub diff_array{
  use List::MoreUtils qw'each_arrayref';
  return unless @_ && defined wantarray;
  my @out;

  my $iter = each_arrayref(@_);

  my $index = 0;
  while( my @current = $iter->() ){
    next if all_same(@current);

    unshift @current, $index;
    push @out, \@current;
  }continue{ ++$index }

  return @out;
}

This version should be faster if you are going to use this for determining the number of differences often. The output is exactly the same. It just doesn't have to work as hard when returning a number.
Read about wantarray for more information.

sub diff_array{
  use List::MoreUtils qw'each_arrayref';
  return unless @_ && defined wantarray;

  my $iter = each_arrayref(@_);

  if( wantarray ){
    # return structure
    my @out;

    my $index = 0;
    while( my @current = $iter->() ){
      next if all_same(@current);

      unshift @current, $index;
      push @out, \@current;
    }continue{ ++$index }

    return @out;

  }else{
    # only return a count of differences
    my $out = 0;
    while( my @current = $iter->() ){
      ++$out unless all_same @current;
    }
    return $out;
  }
}

diff_array uses the subroutine all_same to determine if all of the current list of elements are the same.

sub all_same{
  my $head = shift;
  return undef unless @_; # not enough arguments
  for( @_ ){
    return 0 if $_ ne $head; # at least one mismatch
  }
  return 1; # all are the same
}

To get just the number of differences:

print scalar diff_array \@array1, \@array2;
my $count  = diff_array \@array1, \@array2;

To get a list of differences:

my @list = diff_array \@array1, \@array2;

To get both:

my $count = my @list = diff_array \@array1, \@array2;

The output for the input you provided:

(
  [ 1, 'B', 'C' ],
  [ 2, 'C', 'H' ],
  [ 5, 'F', 'G' ]
)

Example usage

my @a1 = qw'A B C D E F';
my @a2 = qw'A C H D E G';

my $count = my @list = diff_array \@a1, \@a2;

print "There were $count differences\n\n";

for my $group (@list){
  my $index = shift @$group;
  print "  At index $index\n";
  print "    $_\n" for @$group;
  print "\n";
}

Upvotes: 1

ikegami
ikegami

Reputation: 385764

You shouldn't have nested loops. You only need to go through the indexes once.

use List::Util qw( max );

my @mismatches;
for my $i (0..max($#array1, $#array2)) {
   push @mismatches, $i
      if $i >= @array1
      || $i >= @array2
      || $array1[$i] ne $array2[$i];
   }
}

say "There are " . (0+@mismatches) . " mismatches";
for my $i (@mismatches) {
   ...
}

Since you mentioned grep, this is how you'd replace the for with grep:

use List::Util qw( max );

my @mismatches =
    grep {  $_ >= @array1
         || $_ >= @array2
         || array1[$_] ne $array2[$_] }
    0 .. max($#array1, $#array2);

say "There are " . (0+@mismatches) . " mismatches";
for my $i (@mismatches) {
   ...
}

Upvotes: 4

Marcelo Cantos
Marcelo Cantos

Reputation: 185852

The following code builds a list of mismatched pairs, then prints them out.

@a1 = (A,B,C,D,E,F);
@a2 = (A,C,H,D,E,G);
@diff = map { [$a1[$_] => $a2[$_]] }
            grep { $a1[$_] ne $a2[$_] }
                 (0..($#a1 < $#a2 ? $#a1 : $#a2));
print "$_->[0]->$_->[1]\n" for @diff

Upvotes: 1

derobert
derobert

Reputation: 51147

Well, first, you're going to want to go over each element of one of the arrays, and compare it to the same element of the other array. List::MoreUtils provides an easy way to do this:

use v5.14;
use List::MoreUtils qw(each_array);

my @a = qw(a b c d);
my @b = qw(1 2 3);

my $ea = each_array @a, @b;
while ( my ($a, $b) = $ea->() ) {
    say "a = $a, b = $b, idx = ", $ea->('index');
}

You can extend that to find where there is a non-match by checking inside that while loop (note: this assumes your arrays don't have undefs at the end, or that if they do, undef is the same as having a shorter array):

my @mismatch;
my $ea = each_array @a, @b;
while ( my ($a, $b) = $ea->() ) {
    if (defined $a != defined $b || $a ne $b) {
        push @mismatch, $ea->('index');
    }
}

and then:

say "Mismatched count = ", scalar(@mismatch), " items are: ", join(q{, }, @mismatch);

Upvotes: 1

Russell Zahniser
Russell Zahniser

Reputation: 16364

You have the right idea, but you only need a single loop, since you are looking at each index and comparing entries between the arrays:

foreach my $a1 ( 0 .. $#array1) {
  if($array1[$a1] ne $array2[$a1]) {
     print "$a1: $array1[$a1] <-> $array2[$a1]\n";
   }
}

Upvotes: 0

Kirsten Jones
Kirsten Jones

Reputation: 2706

You're iterating over both arrays when you don't want to be doing so.

@array1 = ("A","B","C","D","E","F");
@array2 = ("A","C","H","D","E","G");
foreach my $index (0 .. $#array1) {
   if ($array1[$index] ne $array2[$index]) {
       print "Arrays differ at index $index: $array1[$index] and $array2[$index]\n";
   }
}

Output:

Arrays differ at index 1: B and C
Arrays differ at index 2: C and H
Arrays differ at index 5: F and G

Upvotes: 1

Related Questions