Lanbo
Lanbo

Reputation: 15682

Matching numbers for substitution in Perl

I have this little script:

my @list = ('R3_05_foo.txt','T3_12_foo_bar.txt','01.txt');

foreach (@list) {
    s/(\d{2}).*\.txt$/$1.txt/;
    s/^0+//;
    print $_ . "\n";
}

The expected output would be

5.txt
12.txt
1.txt

But instead, I get

R3_05.txt
T3_12.txt
1.txt

The last one is fine, but I cannot fathom why the regex gives me the string start for $1 on this case.

Upvotes: 3

Views: 96

Answers (4)

Casimir et Hippolyte
Casimir et Hippolyte

Reputation: 89547

Try this pattern

foreach (@list) {
    s/^.*?_?(?|0(\d)|(\d{2})).*\.txt$/$1.txt/;
    print $_ . "\n";
}


Explanations:

I use here the branch reset feature (i.e. (?|...()...|...()...)) that allows to put several capturing groups in a single reference ( $1 here ). So, you avoid using a second replacement to trim a zero from the left of the capture.

To remove all from the begining before the number, I use :

.*?     # all characters zero or more times 
        # ( ? -> make the * quantifier lazy to match as less as possible)
_?      # an optional underscore



Note that you can ensure that you have only 2 digits adding a lookahead to check if there is not a digit that follows:

s/^.*?_?(?|0(\d)|(\d{2}))(?!\d).*\.txt$/$1.txt/;

(?!\d) means not followed by a digit.

Upvotes: 3

TLP
TLP

Reputation: 67900

The problem here is that your substitution regex does not cover the whole string, so only part of the string is substituted. But you are using a rather complex solution for a simple problem.

It seems that what you want is to read two digits from the string, and then add .txt to the end of it. So why not just do that?

my @list = ('R3_05_foo.txt','T3_12_foo_bar.txt','01.txt');

for (@list) {
    if (/(\d{2})/) {
        $_ = "$1.txt";
    }
}

To overcome the leading zero effect, you can force a conversion to a number by adding zero to it:

$_ = 0+$1 . ".txt";

Upvotes: 2

innaM
innaM

Reputation: 47829

The problem is that the first part in your s/// matches, what you think it does, but that the second part isn't replacing what you think it should. s/// will only replace what was previously matched. Thus to replace something like T3_ you will have to match that too.

s/.*(\d{2}).*\.txt$/$1.txt/;

Upvotes: 1

Sedi
Sedi

Reputation: 71

I would modify your regular expression. Try using this code:

my @list = ('R3_05_foo.txt','T3_12_foo_bar.txt','01.txt');

foreach (@list) {
    s/.*(\d{2}).*\.txt$/$1.txt/;
    s/^0+//;
    print $_ . "\n";
}

Upvotes: 1

Related Questions