Reputation: 25177
I run a simple program:
my $_ = '/login/.htaccess/.htdf';
s!(/\.ht.*?)$!/!;
print "$_ $1";
OUT
/login/ /.htaccess/.htdf
I want this regex to match only /.htdf
.
Example 2:
my $_ = 'abcbc';
m/(b.*?)$/;
print "$_ $1\n";
OUT
abcbc bcbc
I expect bc
.
Why is *?
still greedy? (I want the minimal match.)
Upvotes: 8
Views: 188
Reputation: 386246
Atoms are matched in sequence, and each atom after the first must match at the position where the previous atom left off matching. (The first atom is implicitly preceded by \A(?s:.)*?
.) That means that .*
/.*?
doesn't get to decided where it starts matching; it only gets to decided where it stops matching.
It's not being greedy. \.ht
brings the match to position 10, and at position 10, the minimum .*?
can match and still have the rest of the pattern match is access/.htdf
. In fact, it's the only thing .*?
can match at position 10 and still have the rest of the pattern match.
I think you want to remove that last part of the path if it starts with .ht
, leaving the preceding /
in place. For that, you can use either of the following:
s{/\.ht[^/]*$}{/}
or
s{/\K\.ht[^/]*$}{}
It's not being greedy. b
brings the match to position 2, and at position 2, the minimum .*?
can match and still have the rest of the pattern match is cbc
. In fact, it's the only thing .*?
can match at position 2 and still have the rest of the pattern match.
You are probably looking for
/b[^b]*$/
or
/b(?:(?!b).)*$/ # You'd use this if "b" was really more than one char.
Upvotes: 8
Reputation:
The regex works like you've made it.
But if you want to use the dot metacharacter, it must be greedy.
This should work s!.*/\K\.ht.*$!!
It basically lops off the end .ht...
If you want to be specific, you'd need s!/\K\.htdf$!!
Upvotes: 0
Reputation: 507
Why shouldn't it? The greediness is in the forward direction, not backwards. In non-greedy mode, the state machine starts matching and does the check at every step instead of just munching it all and then backtrack, but this does not guarantee you of the "minimal match".
Maybe you might want to avoid matching /
? Like in s{/\.ht[^/]*$}{/}
.
Upvotes: 1
Reputation: 785406
You can use a negative lookahead for this:
~/(\.ht(?!.*\.ht).*)$~
(?!.*\.ht)
is a negative lookahead that makes sure there is no .ht
occurrence after .ht
thus making sure only last .ht
is matched.
.*?
will be non-greedy if there is some pattern after this on right hand.
Code:
$str = '/login/.htaccess/.htdf';
$str =~ s~/(\.ht(?!.*\.ht).*)$~/~m;
print "$str\n";
Upvotes: 1