Reputation: 55

how to match two sequences using arrays in perl

When looping through two arrays, I am confused about how to move the pointer through one loop but keep it constant in another. So for example:

Array 1: A T C G T C G A G C G
Array 2: A C G T C C T G T C G

So A in the first array matches A in the second array so we move on to next elements. But since the T doesn't match the C in the 2nd index, I want the program to compare that T to the next G in array 2 and so on until it finds the matching T.

my ($array1ref, $array2ref) = @_;

my @array1 = @$array1ref;
my @array2= @$array2ref;
my $count = 0; 
foreach my $element (@array1) {
 foreach my $element2 (@array2) {
 if ($element eq $element2) {
 $count++;
  }else { ???????????


}

Upvotes: 4

Answers (7)

Ken Williams

Reputation: 23975

As you may know, your problem is called Sequence Alignment. There are well-developed algorithms to do this efficiently, and one such module Algorithm::NeedlemanWunsch is available on CPAN. Here's how you might apply it to your problem.

#!/usr/bin/perl

use Algorithm::NeedlemanWunsch;

my $arr1 = [qw(A T C G T C G A G C G)];
my $arr2 = [qw(A C G T C C T G T C G)];

my $matcher = Algorithm::NeedlemanWunsch->new(sub {@_==0 ? -1 : $_[0] eq $_[1] ? 1 : -2});

my (@align1, @align2);
my $result = $matcher->align($arr1, $arr2,
  {
   align   => sub {unshift @align1, $arr1->[shift]; unshift @align2, $arr2->[shift]},
   shift_a => sub {unshift @align1, $arr1->[shift]; unshift @align2,            '.'},
   shift_b => sub {unshift @align1,            '.'; unshift @align2, $arr1->[shift]},
  });

print join("", @align1), "\n";
print join("", @align2), "\n";

That prints out an optimal solution in terms of the costs we specified in the constructor:

ATCGT.C.GAGCG
A.CGTTCGG.TCG

A very different method from the one in your original question, but I think it's worth knowing about.

Upvotes: 0

Joe Z

Reputation: 17936

It seems like you could do this pretty easily with a 'grep', if you're guaranteed that array2 is always as long as or longer than array1. Something like this:

sub align
{
    my ($array1, $array2) = @_;
    my $index = 0;

    return grep
           {
               $array1->[$index] eq $array2->[$_] ? ++$index : 0
           } 0 .. scalar( @$array2 ) - 1;
}

Basically, the grep is saying "Return me the list of increasing indices into array2 that match contiguous elements from array1."

If you run the above with this test code, you can see it returns the expected alignment array:

my @array1 = qw(A T C G T C G A G C G);
my @array2 = qw(A C G T C C T G T C G);

say join ",", align \@array1, \@array2;

This outputs the expected mapping: 0,3,4,7,8,9,10. That list means that @array1[0 .. 6] correspond to @array2[0,3,4,7,8,9,10].

(Note: You need to use Modern::Perl or similar to use say.)

Now, you haven't really said what you need the output of the operation to be. I've assumed you wanted this mapping array. If you just need a count of the number of elements skipped in @array2 while aligning it with @array1, you can still use the grep above, but instead of the list, just return scalar(@$array2) - $index at the end.

Upvotes: 0

Andomar

Reputation: 238076

You could use a while loop to search for matches. If you find a match, advance in both arrays. If you don't, advance the second array. At the end you could print the remaining unmatched characters from the first array:

# [1, 2, 3] is a reference to an anonymous array (1, 2, 3)
# qw(1, 2, 3) is shorthand quoted-word for ('1', '2', '3')
my $arr1 = [qw(A T C G T C G A G C G)];
my $arr2 = [qw(A C G T C C T G T C G)];

my $idx1 = 0;
my $idx2 = 0;

# Find matched characters
# @$arr_ref is the size of the array referenced by $arr_ref
while ($idx1 < @$arr1 && $idx2 < @$arr2) {
    my $char1 = $arr1->[$idx1];
    my $char2 = $arr2->[$idx2];
    if ($char1 eq $char2) {
        # Matched character, advance arr1 and arr2
        printf("%s %s  -- arr1[%d] matches arr2[%d]\n", $char1, $char2, $idx1, $idx2);
        ++$idx1;
        ++$idx2;
    } else {
        # Unmatched character, advance arr2
        printf(". %s  -- skipping arr2[%d]\n", $char2, $idx2);
        ++$idx2;
    }
}

# Remaining unmatched characters
while ($idx1 < @$arr1) {
    my $char1 = $arr1->[$idx1];
    printf("%s .  -- arr1[%d] is beyond the end of arr2\n", $char1, $idx1);
    $idx1++;
}

The script prints:

A A  -- arr1[0] matches arr2[0]
. C  -- skipping arr2[1]
. G  -- skipping arr2[2]
T T  -- arr1[1] matches arr2[3]
C C  -- arr1[2] matches arr2[4]
. C  -- skipping arr2[5]
. T  -- skipping arr2[6]
G G  -- arr1[3] matches arr2[7]
T T  -- arr1[4] matches arr2[8]
C C  -- arr1[5] matches arr2[9]
G G  -- arr1[6] matches arr2[10]
A .  -- arr1[7] is beyond the end of arr2
G .  -- arr1[8] is beyond the end of arr2
C .  -- arr1[9] is beyond the end of arr2
G .  -- arr1[10] is beyond the end of arr2

Upvotes: 3

ikegami

Reputation: 385685

Nested loops makes no sense. You don't want to loop over anything more than once.

You didn't specify what you wanted to happen after you resync, so you'll want to start with the following and tailor it to your needs.

my ($array1, $array2) = @_;

my $idx1 = 0;
my $idx2 = 0;
while ($idx1 < @$array1 && $idx2 < @$array2) {
   if ($array1->[$idx1] eq $array2->[$idx2]) {
      ++$idx1;
      ++$idx2;
   } else {
      ++$idx2;
   }
}

...

As is, the above snippet will leave $idx1 at the last index it couldn't (eventually) resync. If instead you want to stop as soon as you first resync, you want

my ($array1, $array2) = @_;

my $idx1 = 0;
my $idx2 = 0;
my $mismatch = 0;
while ($idx1 < @$array1 && $idx2 < @$array2) {
   if ($array1->[$idx1] eq $array2->[$idx2]) {
      last if $mismatched;          
      ++$idx1;
      ++$idx2;
   } else {
      ++$mismatched;
      ++$idx2;
   }
}

...

Upvotes: 2

Björn Labitzke

Reputation: 126

Here is one possibility. It will use indexes to go through both lists.

my @array1 = qw(A T C G T C G A G C G);
my @array2 = qw(A C G T C C T G T C G);

my $count = 0;
my $idx1 = 0;
my $idx2 = 0;

while(($idx1 < scalar @array1) && ($idx2 < scalar @array2)) {
    if($array1[$idx1] eq $array2[$idx2]) {
        print "Match of $array1[$idx1] array1 \@ $idx1 and array2 \@ $idx2\n";
        $idx1++;
        $idx2++;
        $count++;
    } else {
        $idx2++;
    }
}

print "Count = $count\n";

Upvotes: 0

user1937198

Reputation: 5348

This is to complex a condition for for loops use while loops instead

my ($array1ref, $array2ref) = @_;

my @array1 = @$array1ref;
my @array2= @$array2ref;
my $count = 0;
my ($index, $index2) = (0,0);
#loop while indexs are in arrays
while($index <= @#array1 && $index2 <= @#array2) { 
    if($array1[$index] eq $array2[$index2]) {
        $index++;
        $index2++;
    } else {
        #increment index until we find a match
        $index2++ until $array1[$index] eq $array2[$index2];
    }
}

Upvotes: 0

amon

Reputation: 57600

The foreach loops won't cut it: We'll either want to loop while there are available elements in both arrays, or iterate through all indices, which we can increment as we like:

EL1: while (defined(my $el1 = shift @array1) and @array2) {
  EL2: while(defined(my $el2 = shift @array2)) {
    ++$count and next EL1 if $el1 eq $el2; # break out of inner loop
  }
}

my $j = 0; # index of @array2
for (my $i = 0; $i <= $#array1; $i++) {
  $j++ until $j > $#array or $array1[$i] eq $array2[$j];
  last if $j > $#array;
  $count++;
}

or any combination.

Upvotes: 0

how to match two sequences using arrays in perl

Answers (7)

Related Questions