ewok
ewok

Reputation: 21443

perl: add to a path using substitution

My script takes in a filepath, and I want to append a directory to the end of the path. The issue is I want to be agnostic of whether the argument has a trailing slash or not. So for example:

$ perl myscript.pl /path/to/dir
/path/to/dir/new
$ perl myscript.pl /path/to/dir/
/path/to/dir/new

I tried $path =~ s/\/?$/\/new/g, but that results in a double /new if a slash is present:

$ perl myscript.pl /path/to/dir
/path/to/dir/new/new
$ perl myscript.pl /path/to/dir
/path/to/dir/new

What's wrong?

Upvotes: 0

Views: 53

Answers (2)

Sobrique
Sobrique

Reputation: 53478

Because /g is 'global' and will match multiple times:

#!/usr/bin/env perl
use strict;
use warnings;

#turn on debugging
use re 'debug';

my $path = '/path/to/dir/';
$path =~ s/\/?$/\/new/g;

print $path;

After the first replacement, the regex engine has 'left' the "end of line" marker, and doesn't need to match the optional /. So matches a second time.

E.g.:

Compiling REx "/?$"
Final program:
   1: CURLY {0,1} (5)
   3:   EXACT </> (0)
   5: SEOL (6)
   6: END (0)
floating ""$ at 0..1 (checking floating) minlen 0 
Matching REx "/?$" against "/path/to/dir/"
Intuit: trying to determine minimum start position...
  doing 'check' fbm scan, [0..13] gave 13
  Found floating substr ""$ at offset 13 (rx_origin now 12)...
  (multiline anchor test skipped)
  try at offset...
Intuit: Successfully guessed: match at offset 12
  12 <path/to/dir> </>       |  1:CURLY {0,1}(5)
                                  EXACT </> can match 1 times out of 1...
  13 <path/to/dir/> <>       |  5:  SEOL(6)
  13 <path/to/dir/> <>       |  6:  END(0)
Match successful!
Matching REx "/?$" against ""
Intuit: trying to determine minimum start position...
  doing 'check' fbm scan, [13..13] gave 13
  Found floating substr ""$ at offset 13 (rx_origin now 13)...
  (multiline anchor test skipped)
Intuit: Successfully guessed: match at offset 13
  13 <path/to/dir/> <>       |  1:CURLY {0,1}(5)
                                  EXACT </> can match 0 times out of 1...
  13 <path/to/dir/> <>       |  5:  SEOL(6)
  13 <path/to/dir/> <>       |  6:  END(0)
Match successful!
Matching REx "/?$" against ""
Intuit: trying to determine minimum start position...
  doing 'check' fbm scan, [13..13] gave 13
  Found floating substr ""$ at offset 13 (rx_origin now 13)...
  (multiline anchor test skipped)
Intuit: Successfully guessed: match at offset 13
  13 <path/to/dir/> <>       |  1:CURLY {0,1}(5)
                                  EXACT </> can match 0 times out of 1...
  13 <path/to/dir/> <>       |  5:  SEOL(6)
  13 <path/to/dir/> <>       |  6:  END(0)

This is because $ is a zero width position anchor. And so is \/? if there's no matches. Once the pattern has been consumed all the way up to the trailing / and replaced.. then the regex engine continues (because you told it to with /g) and find just $ left, because that's still the end of line. And that's still a valid match to replace.

But why not instead use File::Spec:

#!/usr/bin/env perl
use strict;
use warnings;
use File::Spec;
use Data::Dumper;

my $path = '/path/to/dir/';

my @dirs = File::Spec->splitdir($path);

print Dumper \@dirs;

$path = File::Spec->catdir(@dirs, "new" );
print $path;

This provides you with a platform independent way to split and join path elements, and doesn't rely on regex matching - which there's various ways it could break (such as the one you found).

Upvotes: 2

Dada
Dada

Reputation: 6626

Drop the /g modifier:

$path =~ s/\/?$/\/new/

works fine.
You only want to modify add one "new" at the end, so having a /g modifier makes no sense.


Also, note that you can use different delimiters for your regex:

$path =~ s{ /? $}{/new}x;

is a little bit clearer.

Upvotes: 1

Related Questions