user1146453
user1146453

Reputation: 1

Nested Loops in Perl - Failing logic

I'm trying to work out a bigger problem, but have simplified the issue for readability, ultimately the logic below is the reason the extended program is failing.

I am using Perl to search for a short sequence of letters within a larger sequence (protein sequences), and if it is not found, then I'd like to do some calculations. I don't know whether I'm going crazy, but I can't work out why this logic is failing.

sub calculateEpitopeMutations {

    my @mutationArray;
    my @epitopeArray;
    my $count;
    my $localEpitope;

    open( EPITOPESIN2, $ARGV[5] ) or die "Unable to open file $ARGV[5]\n";
    while ( my $line = <EPITOPESIN2> ) {
        chomp $line;
        push @epitopeArray, $line;
    }

    while ( my ( $key, $value ) = each our %sequencesForCalculation ) {

        foreach ( @epitopeArray ) {

            $localEpitope = $_;

            if ( $value =~ /($localEpitope)/g ) {
                print "$key\n$localEpitope\nexactly the same\n\n";
                next;
            }
            else {

                #This is where I'd like to do the further calculations

                print "$key\n$localEpitope\nthere is a difference\n\n";
                next;
            }
        }
    }
}

$ARGV[5] is the name of a text file containing a list of 9-character sequences, exactly like the following

RVSENIQRF
SFQVDCFLW

The idea is to put these into array @epitopeArray and iterate through these, and compare them with all (currently just one) $value sequences in the hash %sequencesForCalculation.

%sequencesForCalculation is a hash, where $value is a long sequence of characters, like this

MDSNTMSSFQVDCFLWHIRKRFADNGLGDAPFLDRLRRDQKSLKGRGNTLGLDIETATLVGKQIVEWILKEESSETLRMTIASVPTSRYLSDMTLEEMSRDWFMLMPRQKKIGPLCVRLDQAVMEKNIVLKANFSVIFNRLETLILLRAFTEEEAIVGEISPLPSLPGHTYEDVKNAVGVLIGGLEWNGNTVRVSENIQRFAWRNCDENGRPSLPPEQK

Currently, the small 9-character long sequence $localEpitope is contained in the longer sequence $value so when I iterate through the program, I should get this printed every time.

($key contains a header of information about the protein sequences, but is irrelevant so I have shortened it to just the variable name.)

$key
RVSENIQRF
Exactly the same
$key
SFQVDCFLW
Exactly the same
$key

But instead I'm getting this

$key
RVSENIQRF
exactly the same
$key
SFQVDCFLW
there is a difference
$key

Any ideas? Please let me know if anything further is required.

Upvotes: 0

Views: 61

Answers (1)

Borodin
Borodin

Reputation: 126722

Update

TL;DR: You should change $value =~ /($localEpitope)/g to $value =~ /$localEpitope/

Okay now that we know the real circumstances, the problem (as melpomene points out in his comment) is that you have the /g modifier on your pattern match. There's no reason for that; you don't want check how many times the substring appears, you just want to know whether it's there at all

The problem is that variables subjected to a /g pattern search keep a state that says where the last search ended. So you're searching for $epitopeArray[0] in the longer string and finding it, and then searching for $epitopeArray[1] from where the previous search terminated. The first substring appears after the second one, so only the first is found

For more information on this behaviour, take a look at the pos function which returns the current value of this state. For instance pos($value) will return the character offset where the next m//g will start its search

This short program demonstrates the problem. With the /g modifier only BBB is found. Remove it and both are found

use strict;
use warnings;
use 5.010;

my $long_s = 'xxxAAAxxxBBBxxx';

for my $substr ( qw/ BBB AAA / ) {
  if ( $long_s =~ /$substr/g ) {
    say "$substr okay";
  }
  else {
    say "$substr nope";
  }
}

output

BBB okay
AAA nope

Original

You say

Currently, the small 9-character long sequence ($localEpitope) IS contained in the longer sequence ($value), and so when I iterate through the program, I should get the following printed everytime

So $localEpitope is a substring of $value and you're saying that

$value =~ /($localEpitope)/g

evaluates to true

That is correct behaviour. $value =~ /$localEpitope/ will check whether $localEpitope can be found anywhere in $value

Unfortunately it's not clear enough from what you've written to suggest a solution

Upvotes: 2

Related Questions