Silverlan
Silverlan

Reputation: 2911

Alternative to 'realpath' to resolve "../" and "./" in a path

realpath does what I need, but only works if the files in the path actually exist.

I need a function which returns a normalized path from a string (e.g. ../some/./directory/a/b/c/../d to some/directory/a/b/d), regardless of whether the directories/files actually exist

Essentially the equivalent to PathCanonicalize on windows.

Does such a function already exist?

Upvotes: 13

Views: 6415

Answers (6)

sjnarv
sjnarv

Reputation: 2374

Another attempt. Quirks/features of this one:

  • does not canonicalize into the source string; writes to caller-supplied space
  • has a notion of absolute vs relative path (did the source path begin with '/' ?): if enough '..' are present to eat all the source, emits a '/' for an absolute path, and a '.' for relative
  • has no notion whether the elements in the source path correspond to actual filesystem objects
  • uses C99 variable length arrays, and given its return into caller-supplied space, no malloc, but makes a couple copies under the hood.
  • given those copies, source and destination can be the same
  • uses strtok_r(3), whose quirks around not returning zero-length tokens seem to match desired behavior for adjacent '/' characters.

Source:

#include <stdlib.h>
#include <string.h>

int
pathcanon(const char *srcpath, char *dstpath, size_t sz)
{
    size_t plen = strlen(srcpath) + 1, chk;
    char wtmp[plen], *tokv[plen], *s, *tok, *sav;
    int i, ti, relpath;

    relpath = (*srcpath == '/') ? 0 : 1;

    /* make a local copy of srcpath so strtok(3) won't mangle it */

    ti = 0;
    (void) strcpy(wtmp, srcpath);

    tok = strtok_r(wtmp, "/", &sav);
    while (tok != NULL) {
        if (strcmp(tok, "..") == 0) {
            if (ti > 0) {
                ti--;
            }
        } else if (strcmp(tok, ".") != 0) {
            tokv[ti++] = tok;
        }
        tok = strtok_r(NULL, "/", &sav);
    }

    chk = 0;
    s = dstpath;

    /*
     * Construct canonicalized result, checking for room as we
     * go. Running out of space leaves dstpath unusable: written
     * to and *not* cleanly NUL-terminated.
     */
    for (i = 0; i < ti; i++) {
        size_t l = strlen(tokv[i]);

        if (i > 0 || !relpath) {
            if (++chk >= sz) return -1;
            *s++ = '/';
        }

        chk += l;
        if (chk >= sz) return -1;

        strcpy(s, tokv[i]);
        s += l;
    }

    if (s == dstpath) {
        if (++chk >= sz) return -1;
        *s++ = relpath ? '.' : '/';
    }
    *s = '\0';

    return 0;
}

Edit: missed the check for room when s == dstpath. Legit callers will likely provide more than 0 or 1 byte of target storage, but it's a tough world out there.

Upvotes: 5

Roman Susi
Roman Susi

Reputation: 4199

Python source code has an implementation of os.path.normpath for several platforms. The POSIX one (in the Lib/posixpath.py, for Python 3, line 318, or for Python 2, line 308) is unfortunately in Python, but the general logic can be easily reimplemented in C (the function is quite compact). Tested by many years of use.

There are other platform normpath implementations in Python interpreter and standard library source code as well, so portable solution can be a combination of those.

Probably other systems/libraries, written in C, do have implementations of the same, as the normpath function is critical in a security sense.

(And the main advantage of having Python code is to be able to test your function in C with whatever, even random, input in parallel - and this kind of testing is important to make the function secure)

Upvotes: 7

snap
snap

Reputation: 2792

I do not think there is any standard library function available for this.

You can use the function ap_getparents() in Apache httpd source code file server/util.c. I believe it does exactly what you want: https://github.com/apache/httpd/blob/trunk/server/util.c#L500

#ifdef WIN32
#define IS_SLASH(s) ((s == '/') || (s == '\\'))
#else
#define IS_SLASH(s) (s == '/')
#endif

void ap_getparents(char *name)
{
    char *next;
    int l, w, first_dot;

    /* Four paseses, as per RFC 1808 */
    /* a) remove ./ path segments */
    for (next = name; *next && (*next != '.'); next++) {
    }

    l = w = first_dot = next - name;
    while (name[l] != '\0') {
        if (name[l] == '.' && IS_SLASH(name[l + 1])
            && (l == 0 || IS_SLASH(name[l - 1])))
            l += 2;
        else
            name[w++] = name[l++];
    }

    /* b) remove trailing . path, segment */
    if (w == 1 && name[0] == '.')
        w--;
    else if (w > 1 && name[w - 1] == '.' && IS_SLASH(name[w - 2]))
        w--;
    name[w] = '\0';

    /* c) remove all xx/../ segments. (including leading ../ and /../) */
    l = first_dot;

    while (name[l] != '\0') {
        if (name[l] == '.' && name[l + 1] == '.' && IS_SLASH(name[l + 2])
            && (l == 0 || IS_SLASH(name[l - 1]))) {
            int m = l + 3, n;

            l = l - 2;
            if (l >= 0) {
                while (l >= 0 && !IS_SLASH(name[l]))
                    l--;
                l++;
            }
            else
                l = 0;
            n = l;
            while ((name[n] = name[m]))
                (++n, ++m);
        }
        else
            ++l;
    }

    /* d) remove trailing xx/.. segment. */
    if (l == 2 && name[0] == '.' && name[1] == '.')
        name[0] = '\0';
    else if (l > 2 && name[l - 1] == '.' && name[l - 2] == '.'
             && IS_SLASH(name[l - 3])) {
        l = l - 4;
        if (l >= 0) {
            while (l >= 0 && !IS_SLASH(name[l]))
                l--;
            l++;
        }
        else
            l = 0;
        name[l] = '\0';
    }
}

(This is assuming re-using Apache Licensed code in your project is acceptable.)

Upvotes: 10

David C. Rankin
David C. Rankin

Reputation: 84561

According to your problem statement, the following does exactly what you ask. The bulk of the code was from path.c as provided in the link in the comment. The modification to remove the preceding ../ was added to comply with your problem statement:

#include <stdio.h>
#include <stdlib.h>
#include <string.h>

void pathCanonicalize (char *path);

int main (int argc, char **argv)
{
    if (argc < 2) {
        fprintf (stderr, "error: insufficient input, usage: %s <path>\n",
                argv[0]);
        return 1;
    }

    char *fullpath = strdup (argv[1]);
    if (!fullpath) {
        fprintf (stderr, "error: virtual memory exhausted.\n");
        return 1;
    }

    pathCanonicalize (fullpath);

    printf ("\n original : %s\n canonical: %s\n\n", argv[1], fullpath);

    free (fullpath);

    return 0;
}

void pathCanonicalize (char *path)
{
    size_t i;
    size_t j;
    size_t k;

    //Move to the beginning of the string
    i = 0;
    k = 0;

    //Replace backslashes with forward slashes
    while (path[i] != '\0') {
        //Forward slash or backslash separator found?
        if (path[i] == '/' || path[i] == '\\') {
            path[k++] = '/';
            while (path[i] == '/' || path[i] == '\\')
                i++;
        } else {
            path[k++] = path[i++];
        }
    }

    //Properly terminate the string with a NULL character
    path[k] = '\0';

    //Move back to the beginning of the string
    i = 0;
    j = 0;
    k = 0;

    //Parse the entire string
    do {
        //Forward slash separator found?
        if (path[i] == '/' || path[i] == '\0') {
            //"." element found?
            if ((i - j) == 1 && !strncmp (path + j, ".", 1)) {
                //Check whether the pathname is empty?
                if (k == 0) {
                    if (path[i] == '\0') {
                        path[k++] = '.';
                    } else if (path[i] == '/' && path[i + 1] == '\0') {
                        path[k++] = '.';
                        path[k++] = '/';
                    }
                } else if (k > 1) {
                    //Remove the final slash if necessary
                    if (path[i] == '\0')
                        k--;
                }
            }
            //".." element found?
            else if ((i - j) == 2 && !strncmp (path + j, "..", 2)) {
                //Check whether the pathname is empty?
                if (k == 0) {
                    path[k++] = '.';
                    path[k++] = '.';

                    //Append a slash if necessary
                    if (path[i] == '/')
                        path[k++] = '/';
                } else if (k > 1) {
                    //Search the path for the previous slash
                    for (j = 1; j < k; j++) {
                        if (path[k - j - 1] == '/')
                            break;
                    }

                    //Slash separator found?
                    if (j < k) {
                        if (!strncmp (path + k - j, "..", 2)) {
                            path[k++] = '.';
                            path[k++] = '.';
                        } else {
                            k = k - j - 1;
                        }

                        //Append a slash if necessary
                        if (k == 0 && path[0] == '/')
                            path[k++] = '/';
                        else if (path[i] == '/')
                            path[k++] = '/';
                    }
                    //No slash separator found?
                    else {
                        if (k == 3 && !strncmp (path, "..", 2)) {
                            path[k++] = '.';
                            path[k++] = '.';

                            //Append a slash if necessary
                            if (path[i] == '/')
                                path[k++] = '/';
                        } else if (path[i] == '\0') {
                            k = 0;
                            path[k++] = '.';
                        } else if (path[i] == '/' && path[i + 1] == '\0') {
                            k = 0;
                            path[k++] = '.';
                            path[k++] = '/';
                        } else {
                            k = 0;
                        }
                    }
                }
            } else {
                //Copy directory name
                memmove (path + k, path + j, i - j);
                //Advance write pointer
                k += i - j;

                //Append a slash if necessary
                if (path[i] == '/')
                    path[k++] = '/';
            }

            //Move to the next token
            while (path[i] == '/')
                i++;
            j = i;
        }
        else if (k == 0) {
            while (path[i] == '.' || path[i] == '/') {
                 j++,i++;
            }
        }
    } while (path[i++] != '\0');

    //Properly terminate the string with a NULL character
    path[k] = '\0';
}

Use/Output

$ ./bin/pathcanonical ../some/./directory/a/b/c/../d

 original : ../some/./directory/a/b/c/../d
 canonical: some/directory/a/b/d

Upvotes: 7

Peter
Peter

Reputation: 36597

I assume your host is windows or unix (both support the .., ., and / meaning parent directory, current directory, and directory separator respectively). And that your library provides access to the posix-specified function getcwd() which retrieves the current working directory of your program (i.e. the full path where output files will be written if opened without a path specification in their filename).

First call getcwd() to retrieve the working directory. If the last character in that is a '/', prepend that working directory to your input string without modification. Otherwise prepend both it and the character '/' to your string.

Then just process the string. Find the first instance of the string "../" and remove the previous part of the path and the "../". For example, if the string is "/a/b/c/../foo" the result will be "/a/b/foo". Repeat until no instances of "../" in the string.

The only caveat is deciding what to do with strings like "/../" (which are technically a path that cannot exist). Either leave that as "/" (so you always get a path that is feasible) or report an error.

Once that is done, look for instances of "/./" and replace them with a "/". This will turn strings like "/a/b/c/./" into "/a/b/c/" but will leave strings like "/a/b/c./" (which specify a directory named "c." within "/a/b") alone.

All of the above is just processing the string. Apart from the usage of getcwd(), there is nothing that relies on the host environment. So the process will be the same regardless of whether a path actually exists.

A few bells and whistles might include making it work better with windows, such as treating '/' and '\' as equivalent, and coping with drive specifiers like "a:".

If you don't want to call getcwd() (e.g. if your program does not rely on actually having a working directory, or if it has one that doesn't exist) then you will need to specify a starting condition. For example, where will a string like "../x/y/z" end up?

What I've suggested does allow the . character to be part of filenames (or directory names) which you may or may not want. Adjust as needed.

Upvotes: 4

paulsm4
paulsm4

Reputation: 121649

It sounds like you're on *nix (for example, Linux).

Q: Does your compiler have canonicalize_file_name()?

Otherwise, if you're programming in C++, you might want to consider Boost:

boost::filesystem::canonical

Upvotes: 1

Related Questions