Denilson Sá Maia
Denilson Sá Maia

Reputation: 49477

How to normalize a path in Perl? (without checking the filesystem)

I want the Perl's equivalent of Python's os.path.normpath():

Normalize a pathname by collapsing redundant separators and up-level references so that A//B, A/B/, A/./B and A/foo/../B all become A/B. This string manipulation may change the meaning of a path that contains symbolic links. […]

For instance, I want to convert '/a/../b/./c//d' into /b/c/d.

The path I'm manipulating does NOT represent a real directory in the local file tree. There are no symlinks involved. So a plain string manipulation works fine.

I tried Cwd::abs_path and File::Spec, but they don't do what I want.

my $path = '/a/../b/./c//d';

File::Spec->canonpath($path);
File::Spec->rel2abs($path, '/');
# Both return '/a/../b/c/d'.
# They don't remove '..' because it might change
# the meaning of the path in case of symlinks.

Cwd::abs_path($path);
# Returns undef.
# This checks for the path in the filesystem, which I don't want.

Cwd::fast_abs_path($path);
# Gives an error: No such file or directory

Possibly related link:

Upvotes: 8

Views: 2252

Answers (6)

J-L
J-L

Reputation: 1901

You mentioned that you tried File::Spec and it didn't do what you want. That's because you were probably using it on a Unix-like system, where if you try to cd to something like path/to/file.txt/.. it will fail unless path/to/file.txt is a legitimate directory path.

However, the command cd path/to/file.txt/.. will work on a Win32 system, provided that path/to is a real directory path -- regardless of whether file.txt is a real subdirectory.

In case you don't see where I'm going yet, it's that the File::Spec module won't do what you want (unless you're on a Win32 system), but the module File::Spec::Win32 will do what you want. And what's cool is, File::Spec::Win32 should be available as a standard module even on non-Win32 platforms!

This code pretty much does what you want:

use strict;
use warnings;
use feature 'say';

use File::Spec::Win32;

my $path = '/a/../b/./c//d';
my $canonpath = File::Spec::Win32->canonpath($path);
say $canonpath;   # This prints:  \b\c\d

Unfortunately, since we're using the Win32 flavor of File::Spec, the \ is used as the directory separator (instead of the Unix /). It should be trivial for you to convert those \ to /, provided that the original $path does not contain any \ to begin with.

And if your original $path does contain legitimate \ characters, it shouldn't be too difficult to figure out a way to preserve them (so that they don't get converted to /). Although I have to say that if your paths actually contain \ characters, they have probably caused quite a bit of headaches so far.

And since Unix-like systems (including Win32) supposedly don't allow for null characters in their pathnames, one solution to preserving the \ characters in your pathnames is to first convert them to null bytes, then call File::Spec::Win32->canonpath( ... );, and then convert the null bytes back to the \ characters. This can be done very straight-forward, with no looping:

use File::Spec::Win32;

my $path = '/a/../b/./c//d';
$path =~ s[\\][\0]g;   # Converts backslashes to null bytes.
$path = File::Spec::Win32->canonpath($path);
$path =~ s[\\][/]g;   # Converts \ to / characters.
$path =~ s[\0][\\]g;   # Converts null bytes back to backslashes.
# $path is now set to:  /b/c/d

Upvotes: 1

Georg Mavridis
Georg Mavridis

Reputation: 2341

Fixing Tom van der Woerdt code:

foreach my $path ("/a/b/c/d/../../../e" , "/a/../b/./c//d") {
    my $absolute = $path =~ m!^/!;
    my @c= reverse split m@/@, $path;
    my @c_new;
    while (@c) {
        my $component= shift @c;
        next unless length($component);
        if ($component eq ".") { next; }
        if ($component eq "..") { 
            my $i=0;
            while ($c[$i] && $c[$i] =~ m/^\.{1,2}$/) {
                $i++
            }
            if ($i > $#c) {
                push @c_new, $component unless $absolute;
            } else {
                splice(@c, $i, 1);
            }
            next 
        }
        push @c_new, $component;
    }
    print "/".join("/", reverse @c_new) ."\n";
}

Upvotes: 2

chrispitude
chrispitude

Reputation: 121

My use case was normalizing include paths inside files relative to another path. For example, I might have a file at '/home/me/dita-ot/plugins/org.oasis-open.dita.v1_3/rng/technicalContent/rng/concept.rng' that includes the following file relative to itself:

<include href="../../base/rng/topicMod.rng"/>

and I needed the absolute path of that included file. (The including file path might be absolute or relative.)

Path::Tiny was promising, but I can only use core modules.

I tried using chdir to the include file location then using File::Spec->rel2abs() to resolve the path, but that was painfully slow on my system.

I ended up writing a subroutine to implement a simple string-based method of evaporating '../' components:

#!/usr/bin/perl
use strict;
use warnings;

use Cwd;
use File::Basename;
use File::Spec;

sub adjust_local_path {
 my ($file, $relative_to) = @_;
 return Cwd::realpath($file) if (($relative_to eq '.') || ($file =~ m!^\/!));  # handle the fast cases

 $relative_to = dirname($relative_to) if (-f $relative_to);
 $relative_to = Cwd::realpath($relative_to);
 while ($file =~ s!^\.\./!!) { $relative_to =~ s!/[^/]+$!!; }
 return File::Spec->catdir($relative_to, $file);
}

my $included_file = '/home/chrispy/dita-ot/plugins/org.oasis-open.dita.v1_3/rng/technicalContent/rng/topic.rng';
my $source_file = '.././base/rng/topicMod.rng';
print adjust_local_path($included_file, $source_file)."\n";

The result of the script above is

$ ./test.pl
/home/me/dita-ot-3.1.3/plugins/org.oasis-open.dita.v1_3/rng/technicalContent/base/rng/topicMod.rng

Using realpath() had the nice side-effect of resolving symlinks, which I needed. In the example above, dita-ot/ is a link to dita-ot-3.1.3/.

You can provide either a file or a path as the second argument; if it's a file, the directory path of that file is used. (This was convenient for my own purposes.)

Upvotes: 2

Matt Jacob
Matt Jacob

Reputation: 6553

The Path::Tiny module does exactly this:

use strict;
use warnings;
use 5.010;

use Path::Tiny;
say path('/a/../b/./c//d');

Output:

/b/c/d

Upvotes: 0

Denilson S&#225; Maia
Denilson S&#225; Maia

Reputation: 49477

Given that File::Spec is almost what I needed, I ended up writing a function that removes ../ from File::Spec->canonpath(). The full code including tests is available as a GitHub Gist.

use File::Spec;

sub path_normalize_by_string_manipulation {
    my $path = shift;

    # canonpath does string manipulation, but does not remove "..".
    my $ret = File::Spec->canonpath($path);

    # Let's remove ".." by using a regex.
    while ($ret =~ s{
        (^|/)              # Either the beginning of the string, or a slash, save as $1
        (                  # Followed by one of these:
            [^/]|          #  * Any one character (except slash, obviously)
            [^./][^/]|     #  * Two characters where
            [^/][^./]|     #    they are not ".."
            [^/][^/][^/]+  #  * Three or more characters
        )                  # Followed by:
        /\.\./             # "/", followed by "../"
        }{$1}x
    ) {
        # Repeat this substitution until not possible anymore.
    }

    # Re-adding the trailing slash, if needed.
    if ($path =~ m!/$! && $ret !~ m!/$!) {
        $ret .= '/';
    }

    return $ret;
}

Upvotes: 4

Tom van der Woerdt
Tom van der Woerdt

Reputation: 29985

Removing '.' and '..' from paths is pretty straight-forward if you process the path right-to-left :

my $path= "/a/../b/./c//d";
my @c= reverse split m@/@, $path;
my @c_new;
while (@c) {
    my $component= shift @c;
    next unless length($component);
    if ($component eq ".") { next; }
    if ($component eq "..") { shift @c; next }
    push @c_new, $component;
}
say "/".join("/", reverse @c_new);

(Assumes the path starts with a /)

Note that this violates the UNIX pathname resolution standards, specifically this part :

A pathname that begins with two successive slashes may be interpreted in an implementation-defined manner, although more than two leading slashes shall be treated as a single slash.

Upvotes: 0

Related Questions