Reputation: 1
I'm trying to work out a bigger problem, but have simplified the issue for readability, ultimately the logic below is the reason the extended program is failing.
I am using Perl to search for a short sequence of letters within a larger sequence (protein sequences), and if it is not found, then I'd like to do some calculations. I don't know whether I'm going crazy, but I can't work out why this logic is failing.
sub calculateEpitopeMutations {
my @mutationArray;
my @epitopeArray;
my $count;
my $localEpitope;
open( EPITOPESIN2, $ARGV[5] ) or die "Unable to open file $ARGV[5]\n";
while ( my $line = <EPITOPESIN2> ) {
chomp $line;
push @epitopeArray, $line;
}
while ( my ( $key, $value ) = each our %sequencesForCalculation ) {
foreach ( @epitopeArray ) {
$localEpitope = $_;
if ( $value =~ /($localEpitope)/g ) {
print "$key\n$localEpitope\nexactly the same\n\n";
next;
}
else {
#This is where I'd like to do the further calculations
print "$key\n$localEpitope\nthere is a difference\n\n";
next;
}
}
}
}
$ARGV[5]
is the name of a text file containing a list of 9-character sequences, exactly like the following
RVSENIQRF
SFQVDCFLW
The idea is to put these into array @epitopeArray
and iterate through these, and compare them with all (currently just one) $value
sequences in the hash %sequencesForCalculation
.
%sequencesForCalculation
is a hash, where $value
is a long sequence of characters, like this
MDSNTMSSFQVDCFLWHIRKRFADNGLGDAPFLDRLRRDQKSLKGRGNTLGLDIETATLVGKQIVEWILKEESSETLRMTIASVPTSRYLSDMTLEEMSRDWFMLMPRQKKIGPLCVRLDQAVMEKNIVLKANFSVIFNRLETLILLRAFTEEEAIVGEISPLPSLPGHTYEDVKNAVGVLIGGLEWNGNTVRVSENIQRFAWRNCDENGRPSLPPEQK
Currently, the small 9-character long sequence $localEpitope
is contained in the longer sequence $value
so when I iterate through the program, I should get this printed every time.
($key
contains a header of information about the protein sequences, but is irrelevant so I have shortened it to just the variable name.)
$key
RVSENIQRF
Exactly the same
$key
SFQVDCFLW
Exactly the same
$key
But instead I'm getting this
$key
RVSENIQRF
exactly the same
$key
SFQVDCFLW
there is a difference
$key
Any ideas? Please let me know if anything further is required.
Upvotes: 0
Views: 61
Reputation: 126722
TL;DR: You should change $value =~ /($localEpitope)/g
to $value =~ /$localEpitope/
Okay now that we know the real circumstances, the problem (as melpomene points out in his comment) is that you have the /g
modifier on your pattern match. There's no reason for that; you don't want check how many times the substring appears, you just want to know whether it's there at all
The problem is that variables subjected to a /g
pattern search keep a state that says where the last search ended. So you're searching for $epitopeArray[0]
in the longer string and finding it, and then searching for $epitopeArray[1]
from where the previous search terminated. The first substring appears after the second one, so only the first is found
For more information on this behaviour, take a look at the pos
function which returns the current value of this state. For instance pos($value)
will return the character offset where the next m//g
will start its search
This short program demonstrates the problem. With the /g
modifier only BBB
is found. Remove it and both are found
use strict;
use warnings;
use 5.010;
my $long_s = 'xxxAAAxxxBBBxxx';
for my $substr ( qw/ BBB AAA / ) {
if ( $long_s =~ /$substr/g ) {
say "$substr okay";
}
else {
say "$substr nope";
}
}
BBB okay
AAA nope
You say
Currently, the small 9-character long sequence (
$localEpitope
) IS contained in the longer sequence ($value
), and so when I iterate through the program, I should get the following printed everytime
So $localEpitope
is a substring of $value
and you're saying that
$value =~ /($localEpitope)/g
evaluates to true
That is correct behaviour. $value =~ /$localEpitope/
will check whether $localEpitope
can be found anywhere in $value
Unfortunately it's not clear enough from what you've written to suggest a solution
Upvotes: 2