Reputation: 1276
I have this text $line = "config.txt.1"
, and I want to match it with regex and extract the number
part of it. I am using two versions:
$line = "config.txt.1";
(my $result) = $line =~ /(\d*).*/; #ver 1, matched, but returns nothing
(my $result) = $line =~ /(\d).*/; #ver 2, matched, returns 1
(my $result) = $line =~ /(\d+).*/; #ver 3, matched, returns 1
I think the *
was sort of messing things around, I have been looking at this, but still
don't the greedy mechanism in the regex engine. If I start from left of the regex, and potentially there might be no digits in the text, so for ver 1, it will match too. But for
ver 3, it won't match. Can someone give me an explanation for why it is that and how
I should write for what I want? (potentially with a number, not necessarily single digit)
Edit
Requirement: potentially with a number, not necessarily single digit, and match can not capture anything, but should not fail
The output must be as follows (for the above example):
config.txt
1
Upvotes: 0
Views: 93
Reputation: 6578
Use the literal '.' as a reference to match before the number:
#!/usr/bin/perl
use strict;
use warnings;
my @line = qw(config.txt file.txt config.txt.1 config.foo.2 config.txt.23 differentname.fsdfsdsdfasd.2444);
my (@capture1, @capture2);
foreach (@line){
my (@filematch) = ($_ =~ /(\w+\.\w+)/);
my (@numbermatch) = ($_ =~ /\w+\.\w+\.?(\d*)/);
my $numbermatch = $numbermatch[0] // $numbermatch[1];
push @capture1, @filematch;
push @capture2, @numbermatch;
}
print "$capture1[$_]\t$capture2[$_]\n" for 0 .. $#capture1;
Output:
config.txt
file.txt
config.txt 1
config.foo 2
config.txt 23
differentname.fsdfsdsdfasd 2444
Upvotes: 2
Reputation: 67890
You do not need .*
at all. These two statements assign the exact same number:
my ($match1) = $str =~ /(\d+).*/;
my ($match1) = $str =~ /(\d+)/;
A regex by default matches partially, you do not need to add wildcards.
The reason your first match does not capture a number is because *
can match zero times as well. And since it does not have to match your number, it does not. Which is why .*
is actually detrimental in that regex. Unless something is truly optional, you should use +
instead.
Upvotes: 1
Reputation: 46197
To capture all digits following a final .
and not fail the match if the string doesn't end with digits, use /(?:\.(\d+))?$/
perl -E 'if ("abc.123" =~ /(?:\.(\d+))?$/) { say "matched $1" } else { say "match failed" }'
matched 123
perl -E 'if ("abc" =~ /(?:\.(\d+))?$/) { say "matched $1" } else { say "match failed" }'
matched
Upvotes: 1
Reputation: 1276
Thanks guys, I think I figured out myself what I want:
my ($match) = $line =~ /\.(\d+)?/; #this will match and capture any digit
#number if there was one, and not fail
#if there wasn't one
Upvotes: 1
Reputation: 57640
The regex /(\d*).*/
always matches immediately, because it can match zero characters. It translates to match as many digits at this position as possible (zero or more). Then, match as many non-newline characters as possible. Well, the match starts looking at the c
of config
. Ok, it matches zero digits.
You probably want to use a regex like /\.(\d+)$/
-- this matches an integer number between a period .
and the end of string.
Upvotes: 2