Reputation: 559
I have a tab-delimited file that contains information about itemsets. Each itemset consists of one to three items:
MTMR14_Q1 NOTCH1_Q3 PRKCD_Q1
MTMR14_Q1 NOTCH1_Q3 TFRC_Q3
MTMR14_Q1 NOTCH1_Q3
MTMR14_Q1
MTMR14_Q1 PASD1_Q3
My goal is to retrieve itemsets with three items only:
MTMR14_Q1 NOTCH1_Q3 PRKCD_Q1
MTMR14_Q1 NOTCH1_Q3 TFRC_Q3
I have wrote the following code, but it does not retrieve any itemsets:
#!/usr/bin/perl -w
use strict;
my $input = shift @ARGV or die $!;
open (FILE, "$input") or die $!;
while (<FILE>) {
my $seq = $_;
chomp $seq;
if ($seq =~ /[A-Z]\t[A-Z]\t[A-Z]/) {
#using the binding operator to match a string to a regular expression
print $seq . "\n";
}
}
close FILE;
Could you, please, pinpoint my error?
Upvotes: 1
Views: 291
Reputation: 385789
[A-Z]
matches a single letter.
Skip lines that don't contain exactly 3 fields:
next if $seq !~ /^ [^\t]* \t [^\t]* \t [^\t]* \z/x;
[^\t]*
matches any number of non-tab characters.
Skip lines that don't contain exactly 3 non-empty fields:
next if $seq !~ /^ [^\t]+ \t [^\t]+ \t [^\t]+ \z/x;
[^\t]+
matches any one-or-more non-tab characters.
Presumably, you'll be following up by parsing the lines to get the three fields. If so, you could parse first and check after, like the following does:
my @fields = split /\t/, $seq, -1;
next if @fields != 3; # Require exactly 3 fields.
next if ( grep length, @fields ) != 3; # Requite exactly 3 non-empty fields.
Upvotes: 3