ES55
ES55

Reputation: 480

Regular expressions, matching operator using a string variable in Perl

I am using a regex but am getting some odd, unexpected "matches". "Names" are sent to a subroutine to be compared to an array called @ASlist, which contains multiple rows. The first element of each row is also a name, followed by 0 to several synonyms. The goal is to match the incoming "name" to any row in @ASlist that has a matching cell.

Sample input, from which $names is derived for the comparison against @ASlist:

13  1   13  chr7    7   70606019    74345818    Otud7a  Klf13   E030018B13Rik   Trpm1   Mir211  Mtmr10  Fan1    Mphosph10   Mcee    Apba2   Fam189a1    Ndnl2   Tjp1    Tarsl2  Tm2d3   1810008I18Rik   Pcsk6   Snrpa1  H47 Chsy1   Lrrk1   Aldh1a3 Asb7    Lins    Lass3   Adamts17

Sample lines from @ASlist:

HSPA5   BIP FLJ26106    GRP78   MIF2        
NDUFA5  B13 CI-13KD-B   DKFZp781K1356   FLJ12147    NUFM    UQOR13
ACAN    AGC1    AGCAN   CSPG1   CSPGCP  MSK16   SEDK

The code:

my ($name) = @_;  ## this comes in from another loop elsewhere in code I did not include
chomp $name;

my @collectmatches = (); ## container to collect matches

foreach my $ASline ( @ASlist ){

    my @synonyms = split("\t", $ASline );

    for ( my $i = 0; $i < scalar @synonyms; $i++ ){
         chomp $synonyms[ $i ];
         #print "COMPARE $name TO $synonyms[ $i ]\n";

         if ( $name =~m/$synonyms[$i]/ ){
              print "\tname $name from block matches\n\t$synonyms[0]\n\tvia $synonyms[$i] from AS list\n";
              push ( @collectmatches, $synonyms[0], $synonyms[$i] ); 
          }
         else {
              # print "$name does not match $synonyms[$i]\n"; 
         }
    }
}

The script is working but also reports weird matches. Such as, when $name is "E030018B13Rik" it matches "NDUFA5" when it occurs in @ASlist. These two should not be matched up.

If I change the regex from ~m/$synonyms[$i]/ to ~m/^$synonyms[$i]$/, the "weird" matches go away, BUT the script misses the vast majority of matches.

Upvotes: 0

Views: 178

Answers (4)

Borodin
Borodin

Reputation: 126742

Another, more Perlish way to test for string equality is to use a hash.

You don't show any real test data, but this short Perl program builds a hash from your array @ASlist of lines of match strings. After that, most of the work is done.

The subsequent for loop tests just E030018B13Rik to see if it is one of the keys of the new %ASlist and prints an appropriate message

use strict;
use warnings;

my @ASlist = (
    'HSPA5   BIP FLJ26106    GRP78   MIF2',
    'NDUFA5  B13 CI-13KD-B   DKFZp781K1356   FLJ12147    NUFM    UQOR13',
    'ACAN    AGC1    AGCAN   CSPG1   CSPGCP  MSK16   SEDK',
);

my %ASlist = map { $_ => 1 } map /\S+/g, @ASlist;

for (qw/ E030018B13Rik /) {
  printf "%s %s\n", $_, $ASlist{$_} ? 'matches' : 'doesn\'t match';
}

output

E030018B13Rik doesn't match

Upvotes: 1

choroba
choroba

Reputation: 241988

You are using B13 as the regular expression. As none of the characters has a special meaning, any string containing the substring B13 matches the expression.

E030018B13Rik
       ^^^

If you want the expression to match the whole string, use anchors:

if ($name =~m/^$synonyms[$i]$/) {

Or, use index or eq to detect substrings (or identical strings, respectively), as your input doesn't seem to use any features of regular expressions.

Upvotes: 0

Miller
Miller

Reputation: 35208

The NDUFA5 record contains B13 as a pattern, which will match E030018<B13>Rik.

If you want to be more literal, then add boundary conditions to your regular expression /\b...\b/. Also should probably escape regular expression special characters using quotemeta.

if ( $name =~ m/\b\Q$synonyms[$i]\E\b/ ) {

Or if you want to test straight equality, then just use eq

if ( $name eq $synonyms[$i] ) {

Upvotes: 1

Casimir et Hippolyte
Casimir et Hippolyte

Reputation: 89584

Since you only need to compare two strings, you can simply use eq:

if ( $name eq $synonyms[$i] ){

Upvotes: 0

Related Questions