Reputation: 49477
I want the Perl's equivalent of Python's os.path.normpath()
:
Normalize a pathname by collapsing redundant separators and up-level references so that A//B, A/B/, A/./B and A/foo/../B all become A/B. This string manipulation may change the meaning of a path that contains symbolic links. […]
For instance, I want to convert '/a/../b/./c//d'
into /b/c/d
.
The path I'm manipulating does NOT represent a real directory in the local file tree. There are no symlinks involved. So a plain string manipulation works fine.
I tried Cwd::abs_path
and File::Spec
, but they don't do what I want.
my $path = '/a/../b/./c//d';
File::Spec->canonpath($path);
File::Spec->rel2abs($path, '/');
# Both return '/a/../b/c/d'.
# They don't remove '..' because it might change
# the meaning of the path in case of symlinks.
Cwd::abs_path($path);
# Returns undef.
# This checks for the path in the filesystem, which I don't want.
Cwd::fast_abs_path($path);
# Gives an error: No such file or directory
Possibly related link:
Upvotes: 8
Views: 2252
Reputation: 1901
You mentioned that you tried File::Spec
and it didn't do what you want. That's because you were probably using it on a Unix-like system, where if you try to cd
to something like path/to/file.txt/..
it will fail unless path/to/file.txt
is a legitimate directory path.
However, the command cd path/to/file.txt/..
will work on a Win32 system, provided that path/to
is a real directory path -- regardless of whether file.txt
is a real subdirectory.
In case you don't see where I'm going yet, it's that the File::Spec
module won't do what you want (unless you're on a Win32 system), but the module File::Spec::Win32
will do what you want. And what's cool is, File::Spec::Win32
should be available as a standard module even on non-Win32 platforms!
This code pretty much does what you want:
use strict;
use warnings;
use feature 'say';
use File::Spec::Win32;
my $path = '/a/../b/./c//d';
my $canonpath = File::Spec::Win32->canonpath($path);
say $canonpath; # This prints: \b\c\d
Unfortunately, since we're using the Win32 flavor of File::Spec
, the \
is used as the directory separator (instead of the Unix /
). It should be trivial for you to convert those \
to /
, provided that the original $path
does not contain any \
to begin with.
And if your original $path
does contain legitimate \
characters, it shouldn't be too difficult to figure out a way to preserve them (so that they don't get converted to /
). Although I have to say that if your paths actually contain \
characters, they have probably caused quite a bit of headaches so far.
And since Unix-like systems (including Win32) supposedly don't allow for null characters in their pathnames, one solution to preserving the \
characters in your pathnames is to first convert them to null bytes, then call File::Spec::Win32->canonpath( ... );
, and then convert the null bytes back to the \
characters. This can be done very straight-forward, with no looping:
use File::Spec::Win32;
my $path = '/a/../b/./c//d';
$path =~ s[\\][\0]g; # Converts backslashes to null bytes.
$path = File::Spec::Win32->canonpath($path);
$path =~ s[\\][/]g; # Converts \ to / characters.
$path =~ s[\0][\\]g; # Converts null bytes back to backslashes.
# $path is now set to: /b/c/d
Upvotes: 1
Reputation: 2341
Fixing Tom van der Woerdt code:
foreach my $path ("/a/b/c/d/../../../e" , "/a/../b/./c//d") {
my $absolute = $path =~ m!^/!;
my @c= reverse split m@/@, $path;
my @c_new;
while (@c) {
my $component= shift @c;
next unless length($component);
if ($component eq ".") { next; }
if ($component eq "..") {
my $i=0;
while ($c[$i] && $c[$i] =~ m/^\.{1,2}$/) {
$i++
}
if ($i > $#c) {
push @c_new, $component unless $absolute;
} else {
splice(@c, $i, 1);
}
next
}
push @c_new, $component;
}
print "/".join("/", reverse @c_new) ."\n";
}
Upvotes: 2
Reputation: 121
My use case was normalizing include paths inside files relative to another path. For example, I might have a file at '/home/me/dita-ot/plugins/org.oasis-open.dita.v1_3/rng/technicalContent/rng/concept.rng
' that includes the following file relative to itself:
<include href="../../base/rng/topicMod.rng"/>
and I needed the absolute path of that included file. (The including file path might be absolute or relative.)
Path::Tiny was promising, but I can only use core modules.
I tried using chdir to the include file location then using File::Spec->rel2abs() to resolve the path, but that was painfully slow on my system.
I ended up writing a subroutine to implement a simple string-based method of evaporating '../' components:
#!/usr/bin/perl
use strict;
use warnings;
use Cwd;
use File::Basename;
use File::Spec;
sub adjust_local_path {
my ($file, $relative_to) = @_;
return Cwd::realpath($file) if (($relative_to eq '.') || ($file =~ m!^\/!)); # handle the fast cases
$relative_to = dirname($relative_to) if (-f $relative_to);
$relative_to = Cwd::realpath($relative_to);
while ($file =~ s!^\.\./!!) { $relative_to =~ s!/[^/]+$!!; }
return File::Spec->catdir($relative_to, $file);
}
my $included_file = '/home/chrispy/dita-ot/plugins/org.oasis-open.dita.v1_3/rng/technicalContent/rng/topic.rng';
my $source_file = '.././base/rng/topicMod.rng';
print adjust_local_path($included_file, $source_file)."\n";
The result of the script above is
$ ./test.pl
/home/me/dita-ot-3.1.3/plugins/org.oasis-open.dita.v1_3/rng/technicalContent/base/rng/topicMod.rng
Using realpath() had the nice side-effect of resolving symlinks, which I needed. In the example above, dita-ot/ is a link to dita-ot-3.1.3/.
You can provide either a file or a path as the second argument; if it's a file, the directory path of that file is used. (This was convenient for my own purposes.)
Upvotes: 2
Reputation: 6553
The Path::Tiny module does exactly this:
use strict;
use warnings;
use 5.010;
use Path::Tiny;
say path('/a/../b/./c//d');
Output:
/b/c/d
Upvotes: 0
Reputation: 49477
Given that File::Spec is almost what I needed, I ended up writing a function that removes ../
from File::Spec->canonpath()
. The full code including tests is available as a GitHub Gist.
use File::Spec;
sub path_normalize_by_string_manipulation {
my $path = shift;
# canonpath does string manipulation, but does not remove "..".
my $ret = File::Spec->canonpath($path);
# Let's remove ".." by using a regex.
while ($ret =~ s{
(^|/) # Either the beginning of the string, or a slash, save as $1
( # Followed by one of these:
[^/]| # * Any one character (except slash, obviously)
[^./][^/]| # * Two characters where
[^/][^./]| # they are not ".."
[^/][^/][^/]+ # * Three or more characters
) # Followed by:
/\.\./ # "/", followed by "../"
}{$1}x
) {
# Repeat this substitution until not possible anymore.
}
# Re-adding the trailing slash, if needed.
if ($path =~ m!/$! && $ret !~ m!/$!) {
$ret .= '/';
}
return $ret;
}
Upvotes: 4
Reputation: 29985
Removing '.' and '..' from paths is pretty straight-forward if you process the path right-to-left :
my $path= "/a/../b/./c//d";
my @c= reverse split m@/@, $path;
my @c_new;
while (@c) {
my $component= shift @c;
next unless length($component);
if ($component eq ".") { next; }
if ($component eq "..") { shift @c; next }
push @c_new, $component;
}
say "/".join("/", reverse @c_new);
(Assumes the path starts with a /)
Note that this violates the UNIX pathname resolution standards, specifically this part :
A pathname that begins with two successive slashes may be interpreted in an implementation-defined manner, although more than two leading slashes shall be treated as a single slash.
Upvotes: 0