Andrei Rosu
Andrei Rosu

Reputation: 1387

Regex windows path validator

I've tried to find a windows file path validation for Javascript, but none seemed to fulfill the requirements I wanted, so I decided to build it myself.

The requirements are the following:

Here is the regex I came up with: /^([a-z]:((\|/|\\|//))|(\\|//))[^<>:"|?*]+/i

But there are some issues:

var reg = new RegExp(/^([a-z]:((\\|\/|\\\\|\/\/))|(\\\\|\/\/))[^<>:"|?*]+/i);
var startList = [
  'C://test',
  'C://te?st.html',
  'C:/test',
  'C://test.html',
  'C://test/hello.html',
  'C:/test/hello.html',
  '//test',
  '/test',
  '//test.html',
  '//10.1.1.107',
  '//10.1.1.107/test.html',
  '//10.1.1.107/test/hello.html',
  '//10.1.1.107/test/hello',
  '//test/hello.txt',
  '/test/html',
  '/tes?t/html',
  '/test.html',
  'test.html',
  '//',
  '/',
  '\\\\',
  '\\',
  '/t!esrtr',
  'C:/hel**o'
];

startList.forEach(item => {
  document.write(reg.test(item) + '  >>>   ' + item);
  document.write("<br>");
});

Upvotes: 4

Views: 18718

Answers (3)

NetXpert
NetXpert

Reputation: 649

Since this post seems to be (one of) the top result(s) in a search for a RegEx Windows path validation pattern, and given the caveats / weaknesses of the above proposed solutions, I'll include the solution that I use for validating Windows paths (and which, I believe, addresses all of the points raised previously in that use-case).

I could not come up with a single viable REGEX, with or without look-aheads and look behinds that would do the job, but I could do it with two, without any look-aheads, or -behinds!

Note, though, that successive relative paths (i.e. "..\..\folder\file.exe") will not pass this pattern (though using "..\" or ".\" at the beginning of the string will). Periods and spaces before and after slashes, or at the end of the line are failed, as well as any character not permitted according to Microsoft's short-filename specification: https://learn.microsoft.com/en-us/windows/win32/msi/filename

First Pattern:

^   (?# <- Start at the beginning of the line #)
    (?# validate the opening drive or path delimiter, if present -> #)
        (?: (?# "C:", "C:\", "C:..\", "C:.\" -> #)
                (?:[A-Z]:(?:\.{1,2}[\/\\]|[\/\\])?)
            | (?# or "\", "..\", ".\", "\\" -> #)
                (?:[\/\\]{1,2}|\.{1,2}[\/\\])
        )?
    (?# validate the form and content of the body -> #)
        (?:[^\x00-\x1A|*?\v\r\n\f+\/,;"'`\\:<>=[\]]+[\/\\]?)+
$   (?# <- End at the end of the line. #)

This will generally validate the path structure and character validity, but it also allows problematic things like double-periods, double-backslashes, and both periods and backslashes that are preceded-, and/or followed-by spaces or periods. Paths that end with spaces and/or periods are also permitted. To address these problems I perform a second test with another (similar) pattern:

^   (?# <- Start at the beginning of the line #)
    (?# validate the opening drive or path delimiter, if present -> #)
        (?: (?# "C:", "C:\", "C:..\", "C:.\" -> #)
                (?:[A-Z]:(?:\.{1,2}[\/\\]|[\/\\])?)
            | (?# or "\", "..\", ".\", "\\" -> #)
                (?:[\/\\]{1,2}|\.{1,2}[\/\\])
        )?
    (?# ensure that undesired patterns aren't present in the string -> #)
        (?:([^\/\\. ]|[^\/. \\][\/. \\][^\/. \\]|[\/\\]$)*
    [^\x00-\x1A|*?\s+,;"'`:<.>=[\]]) (?# <- Ensure that the last character is valid #)
$   (?# <- End at the end of the line. #)

This validates that, within the path body, no multiple-periods, multiple-slashes, period-slashes, space-slashes, slash-spaces or slash-periods occur, and that the path doesn't end with an invalid character. Annoyingly, I have to re-validate the <root> group because it's the one place where some of these combinations are allowed (i.e. ".\", "\\", and "..\") and I don't want those to invalidate the pattern.

Here is an implementation of my test (in C#):

/// <summary>Performs pattern testing on a string to see if it's in a form recognizable as an absolute path.</summary>
/// <param name="test">The string to test.</param>
/// <param name="testExists">If TRUE, this also verifies that the specified path exists.</param>
/// <returns>TRUE if the contents of the passed string are valid, and, if requested, the path exists.</returns>
public bool ValidatePath( string test, bool testExists = false )
{
    bool result = !string.IsNullOrWhiteSpace(test);
    string 
        drivePattern = /* language=regex */ 
           @"^(([A-Z]:(?:\.{1,2}[\/\\]|[\/\\])?)|([\/\\]{1,2}|\.{1,2}[\/\\]))?",
        pattern = drivePattern + /* language=regex */ 
           @"([^\x00-\x1A|*?\t\v\f\r\n+\/,;""'`\\:<>=[\]]+[\/\\]?)+$";
    result &= Regex.IsMatch( test, pattern, RegexOptions.ExplicitCapture );
    pattern = drivePattern + /* language=regex */
        @"(([^\/\\. ]|[^\/. \\][\/. \\][^\/. \\]|[\/\\]$)*[^\x00-\x1A|*?\s+,;""'`:<.>=[\]])$";
    result &= Regex.IsMatch( test, pattern, RegexOptions.ExplicitCapture );
    return result && (!testExists || Directory.Exists( test ));
}

Upvotes: 2

Julio
Julio

Reputation: 5308

This may work for you: ^(?!.*[\\\/]\s+)(?!(?:.*\s|.*\.|\W+)$)(?:[a-zA-Z]:)?(?:(?:[^<>:"\|\?\*\n])+(?:\/\/|\/|\\\\|\\)?)+$

You have a demo here

Explained:

^
    (?!.*[\\\/]\s+)         # Disallow files beginning with spaces
    (?!(?:.*\s|.*\.|\W+)$)  # Disallow bars and finish with dot/space
    
    (?:[a-zA-Z]:)? # Drive letter (optional)
    
    (?:
          (?:[^<>:"\|\?\*\n])+  # Word (non-allowed characters repeated one or more)
          (?:\/\/|\/|\\\\|\\)?  # Bars (// or / or \\ or \); Optional
     )+ # Repeated one or more
     
$

Upvotes: 3

Valdi_Bo
Valdi_Bo

Reputation: 30971

Unfortunately, JavaScript flavour of regex does not support lookbehinds, but fortunately it does support lookaheads, and this is the key factor how to construct the regex.

Let's start from some observations:

  1. After a dot, slash, backslash or a space there can not occur another dot, slash or backslash. The set of "forbidden" chars includes also \n, because none of these chars can be the last char of the file name or its segment (between dots or (back-)slashes).

  2. Other chars, allowed in the path are the chars which you mentioned (other than ...), but the "exclusion list" must include also a dot, slash, backslash, space and \n (the chars mentioned in point 1).

  3. After the "initial part" (C:\) there can be multiple instances of char mentioned in point 1 or 2.

Taking these points into account, I built the regex from 3 parts:

  • "Starting" part, matching the drive letter, a colon and up to 2 slashes (forward or backward).
  • The first alternative - either a dot, slash, backslash or a space, with negative lookahead - a list of "forbidden" chars after each of the above chars (see point 1).
  • The second alternative - chars mentioned in point 2.
  • Both the above alternatives can occur multiple times (+ quantifier).

So the regex is as follows:

  • ^ - Start of the string.
  • (?:[a-z]:)? - Drive letter and a colon, optional.
  • [\/\\]{0,2} - Either a backslash or a slash, between 0 and 2 times.
  • (?: - Start of the non-capturing group, needed due to the + quantifier after it.
    • [.\/\\ ] - The first alternative.
    • (?![.\/\\\n]) - Negative lookahead - "forbidden" chars.
  • | - Or.
    • [^<>:"|?*.\/\\ \n] - The second alternative.
  • )+ - End of the non-capturing group, may occur multiple times.
  • $ - End of the string.

If you attempt to match each path separately, use only i option.

But if you have multiple paths in separate rows, and match them globally in one go, add also g and m options.

For a working example see https://regex101.com/r/4JY31I/1

Note: I suppose that ! should also be treated as a forbidden character. If you agree, add it to the second alternative, e.g. after *.

Upvotes: 11

Related Questions