Olha Kholod
Olha Kholod

Reputation: 559

Perl: match regex from the file

I have a tab-delimited file that contains information about itemsets. Each itemset consists of one to three items:

MTMR14_Q1   NOTCH1_Q3   PRKCD_Q1        
MTMR14_Q1   NOTCH1_Q3   TFRC_Q3     
MTMR14_Q1   NOTCH1_Q3           
MTMR14_Q1           
MTMR14_Q1   PASD1_Q3

My goal is to retrieve itemsets with three items only:

MTMR14_Q1   NOTCH1_Q3   PRKCD_Q1        
MTMR14_Q1   NOTCH1_Q3   TFRC_Q3 

I have wrote the following code, but it does not retrieve any itemsets:

#!/usr/bin/perl -w

use strict;

my $input = shift @ARGV or die $!; 

open (FILE, "$input") or die $!;

while (<FILE>) {
    my $seq = $_;
    chomp $seq;
        
    if ($seq =~ /[A-Z]\t[A-Z]\t[A-Z]/) {  
        #using the binding operator to match a string to a regular expression
    
        print $seq . "\n";
    }
}

close FILE;

Could you, please, pinpoint my error?

Upvotes: 1

Views: 291

Answers (1)

ikegami
ikegami

Reputation: 385789

[A-Z] matches a single letter.


Skip lines that don't contain exactly 3 fields:

next if $seq !~ /^ [^\t]* \t [^\t]* \t [^\t]* \z/x;

[^\t]* matches any number of non-tab characters.


Skip lines that don't contain exactly 3 non-empty fields:

next if $seq !~ /^ [^\t]+ \t [^\t]+ \t [^\t]+ \z/x;

[^\t]+ matches any one-or-more non-tab characters.


Presumably, you'll be following up by parsing the lines to get the three fields. If so, you could parse first and check after, like the following does:

my @fields = split /\t/, $seq, -1;

next if @fields != 3;                    # Require exactly 3 fields.

next if ( grep length, @fields ) != 3;   # Requite exactly 3 non-empty fields.

Upvotes: 3

Related Questions