Reputation: 480
I want to capture a number contained in certain lines of a file. I am using Perl and I am using a matching operator to capture the number occurring at a specific position relative to other symbols in the lines of the file. Here is an example line:
fixedStep chrom=chr1 start=3000306 step=1
Here is the relevant portion of the script:
while ( <FILE> ) {
if ( $_=~m/fixedStep/ ) {
my $line = $_;
print $line;
my $position = ($line =~ /start\=(\d+)/);
print "position is $position\n\n";
}
$position
prints as 1
, not the number I need. According the online regex tool regex101.com, the regex I am using works; it captures the appropriate element in the line.
Upvotes: 1
Views: 194
Reputation: 241978
To get the capture groups from a match, you have to call it in list context. It can be turned on by enclosing the scalar on the left hand side of the assignment operator into parentheses:
my ($position) = $line =~ /start=(\d+)/;
Note that =
is not special in regexes, so no need to backslash it. Also be careful with \d
if your input is unicode - you probably do not want to match non-arabic digits (as 四 or ௫).
Upvotes: 6
Reputation: 1056
[ EDIT: See comments for explanation about why struck text is wrong ]
You can use
my ($position) = ($line =~ /start\=(\d+)/);
or
my $position = $line =~ /start\=(\d+)/;
either should work
Otherwise, you are mixing list and scalar contexts, and subsequently just getting the length of the list
Upvotes: 1
Reputation: 118148
When you use my $position = ($line =~ /start\=(\d+)/);
, you are evaluating the match in scalar context, because of the scalar assignment on the LHS. In scalar context, you are going to get the size of the list produced by the matching operation in $position
, which will be either 0
or 1
depending on whether this particular match succeeded.
By using my ($position) =
on the LHS, you create list context. The successful matched substring ends up in $position
(if there are more, they get discarded).
Also, in general, avoid bareword filehandles such as FILE
(except for special builtin ones such as DATA
and ARGV
). Those are package level variables. Also, assign to a lexical variable in the smallest possible scope, instead of overwriting $_
. In addition, the test and match can be combined, resulting in a more specific specification of the string you want to match. Of course, you know the constraints best, so, for example, if the chrom
field always appears second in valid input, you should specify that.
The pattern below just requires that the lines begin with fixedStep
and there is one more field before the one you want to capture.
#!/usr/bin/env perl
use strict;
use warnings;
while (my $line = <DATA>) {
if (my ($position) = ($line =~ m{
\A
fixedStep
\s+ \S+ \s+
start=([0-9]+)
}x)) {
print "$position\n";
}
}
__DATA__
fixedStep chrom=chr1 start=0 step=1
fixedStep chrom=chr1 start=3000306 step=1
start=9999 -- hey, that's wrong
Output:
C:\Temp> tt 0 3000306
Upvotes: 4