Robin
Robin

Reputation: 1645

PHP Regex for matching a UNC path

I'm after a bit of regex to be used in PHP to validate a UNC path passed through a form. It should be of the format:

\\server\something

... and allow for further sub-folders. It might be good to strip off a trailing slash for consistency although I can easily do this with substr if need be.

I've read online that matching a single backslash in PHP requires 4 backslashes (when using a "C like string") and think I understand why that is (PHP escaping (e.g. 2 = 1, so 4 = 2), then regex engine escaping (the remaining 2 = 1). I've seen the following two quoted as equivalent suitable regex to match a single backslash:

$regex = "/\\\\/s";

or apparently this also:

$regex = "/[\\]/s";

However these produce different results, and that is slightly aside from my final aim to match a complete UNC path.

To see if I could match two backslashes I used the following to test:

$path = "\\\\server";
echo "the path is: $path <br />"; // which is \\server
$regex = "/\\\\\\\\\/s";
if (preg_match($regex, $path)) 
{
    echo "matched";
}
else
{
    echo "not matched";
}

The above however seems to match on two or more backslashes :( The pattern is 8 slashes, translating to 2, so why would an input of 3 backslashes ($path = "\\\\\\server") match?

I thought perhaps the following would work:

$regex = "/[\\][\\]/s";

and again, no :(

Please help before I jump out a window lol :)

Upvotes: 3

Views: 5742

Answers (2)

Kaii
Kaii

Reputation: 20540

Use this little gem:

$UNC_regex = '=^\\\\\\\\[a-zA-Z0-9-]+(\\\\[a-zA-Z0-9`~!@#$%^&(){}\'._-]+([ ]+[a-zA-Z0-9`~!@#$%^&(){}\'._-]+)*)+$=s';

Source: http://regexlib.com/REDetails.aspx?regexp_id=2285 (adopted to PHP string escaping)

The RegEx shown above matches for valid hostname (which allows only a few valid characters) and the path part behind the hostname (which allows many, but not all characters)


Sidenote on the backslashes issue:

  • When you use double quotes (") to enclose your string, you must be aware of PHP special character escaping.. "\\" is a single \ in PHP.

  • Important: even with single quotes (') those backslashes must be escaped.
    A PHP string with single quotes takes everything in the string literally (unescaped) with a few exceptions:
    1. A backslash followed by a backslash (\\) is interpreted as a single backslash.
      ('C:\\*.*' => C:\*.*)
    2. A backslash followed by a single-quote (\') is interpreted as a single quote.
      ('I\'ll be back' => I'll be back)
    3. A backslash followed by anything else is interpreted as a backslash.
      ('Just a \ somewhere' => Just a \ somewhere)

  • Also, you must be aware of PCRE escape sequences.
    The RegEx parser treats \ for character classes, so you need to escape it for RegEx, again.
    To match two \\ you must write $regex = "\\\\\\\\" or $regex = '\\\\\\\\'

    From the PHP docs on PCRE escape sequences:

    Single and double quoted PHP strings have special meaning of backslash. Thus if \ has to be matched with a regular expression \, then "\\" or '\\' must be used in PHP code.


Regarding your Question:

why would an input of 3 backslashes ($path = "\\\server") match with regex "/\\\\\\\\/s"?

The reason is that you have no boundaries defined (use ^ for beginning and $ for end of string), thus it finds \\ "somewhere" resulting in a positive match. To get the expected result, you should do something like this:

$regex = '/^\\\\\\\\[^\\\\]/s';

The RegEx above has 2 modifications:

  • ^ at the beginning to only match two \\ at the beginning of the string
  • [^\\] negative character class to say: not followed by an additional backslash

Regarding your last RegEx:

$regex = "/[\\][\\]/s";

You have a confusion (see above for clarification) with backslash escaping here. "/[\\][\\]/s" is interpreted by PHP to /[\][\]/s, which will let the RegEx fail because \ is a reserved character in RegEx and thus must be escaped.

This variant of your RegEx would work, but also match any occurance of two backslashes for the same reason i already explained above:

$regex = '/[\\\\][\\\\]/s';

Upvotes: 6

hakre
hakre

Reputation: 197933

Echo your regex as well, so you see what's the actual pattern, writing those slashes inside PHP can become akward for the pattern, so you can verify it's correct.

Also you should put ^ at the beginning of the pattern to match from string start and $ to the end to specify that the whole string has to be matched.

\\server\something

Regex:

 ~^\\\\server\\something$~

PHP String:

$pattern = '~^\\\\\\\\server\\\\something$~';

For the repetition, you want to say that a server exists and it's followed by one or more \something parts. If server is like something, this can be simplified:

^\\(?:\\[a-z]+){2,}$

PHP String:

$pattern = '~^\\\\(?:\\\\[a-z]+){2,}$~';

As there was some confusion about how \ characters should be written inside single quoted strings:

# Output:
#
# * Definition as '\\' ....... results in string(1) "\"
# * Definition as '\\\\' ..... results in string(2) "\\"
# * Definition as '\\\\\\' ... results in string(3) "\\\"

$slashes = array(
    '\\',
    '\\\\',
    '\\\\\\',
);

foreach($slashes as $i => $slashed) {
    $definition = sprintf('%s ', var_export($slashed, 1));
    ob_start();
    var_dump($slashed);
    $result = rtrim(ob_get_clean());    
    printf(" * Definition as %'.-12s results in %s\n", $definition, $result);
}

Upvotes: 3

Related Questions