Reputation: 53478
With reference to: perl string catenation and substitution in a single line?
Given an input of:
home/////test/tmp/
And a desired transform to:
/home/test/tmp/
(and other file-path like patterns, that need trailing and leading slashes, but no doubles. E.g. /home/test/tmp/
passes through, but /home/test/tmp
gets a trailing slash, etc.)
Using a triple regex;
s,^/*,/,; #prefix
s,/*$,/,; #suffix
s,/+,/,g; #double slashes anywhere else.
Gives us the right result:
#!/usr/bin/env perl
use strict;
use warnings;
my $str = 'home/////teledyne/tmp/';
$str =~ s,^/*,/,; #prefix
$str =~ s,/*$,/,; #suffix
$str =~ s,/+,/,g; #double slashes anywhere else.
print $str;
But if I try and combine these patterns using alternation, I get:
s,(^/*|/+|/*$),/,g
Which looks like it should work... it actually doesn't, and I get a double trailing slash.
But adding a zero width match, it works fine:
s,(^/*|/+|\b/*$),/,g;
Can anyone help me understand what's happening differently in the alternation group, and is there a possible gotcha with just leaving that \b
in there?
Upvotes: 3
Views: 136
Reputation: 66883
The reason is that the /+
alternation under /g
matches the last slash – and the search then goes on because of the presence of the anchor. It continues from the position after the last substitution, thus after the last slash. That search matches zero slashes at $
and adds /
.
We can see this by
perl -wE'
$_ = "home/dir///end/";
while (m{( ^/* | /+ | /*$ )}gx) { say "Got |$1| at ", pos }
'
which prints (with aligned at ...
for readability)
Got || at 0 Got |/| at 5 Got |///| at 11 Got |/| at 15 Got || at 15
With the actual substitution
s{( ^/* | /+ | /*$ )}{ say "Got |$1| at ", pos; q(/) }egx
the numbers differ as they refer to positions in the intermediate strings, where the last two
... Got |/| at 14 Got || at 15
are telling.
I don't see what can go wrong with having \b
, as in the question or as /*\b$
.
This is an interesting question, but I'd like to add that all these details are avoided by
$_ = '/' . (join '/', grep { /./ } split '/', $_) . '/' for @paths;
Upvotes: 2
Reputation: 785108
Here is a single regex to do all:
s='home/////test/tmp/'
perl -pe 's~^(?!/)|(?<!/)$|/{2,}~/~g' <<< "$s"
/home/test/tmp/
s='home/test/tmp'
perl -pe 's~^(?!/)|(?<!/)$|/{2,}~/~g' <<< "$s"
/home/test/tmp/
Regex Breakup:
^(?!/) # Line start if not followed by /
|
(?<!/)$ # Line end if not preceded by /
|
/{2,} # 2 or more /
Upvotes: 0