codemonkeytony
codemonkeytony

Reputation: 190

RegEx to find Windows file paths inside of text

I have been bashing my head against this one for a few hours and I just can't seem to crack it.

I have been tasked with writing an application that loops through numerous config files to identify any valid windows file or folder paths within the text.

eg:

\\\10.0.0.1\folder\
\\\10.0.0.1\folder\filename.txt

\\\servername\folder\
\\\servername\folder\filename.txt

d:\folder\
d:\folder\filename.txt

I am using C# and here is the closest working version I've got so far

string ex = @"(?!.*[\\\/]\s+)(?!(?:.*\s|.*\.|\W+)$)(?:[a-zA-Z]:)?(?:(?:[^<>:\|\?\*\n])+(?:\/\/|\/|\\\\|\\)?)+$";
var rx = new Regex(ex, RegexOptions.IgnoreCase | RegexOptions.Multiline | RegexOptions.Singleline | RegexOptions.Compiled);
var matches = rx.Matches(output);
                        
foreach(Match m in matches)
{

You can see work in progress here

It does exactly what I need with the "d:\" paths but those start with "\\\\" it only kind of works if that path is at the start of the string?!?

Ideally if I could just get the Folder paths returned excluding the file that would be an added bonus.

Any help appreciated.

Upvotes: 1

Views: 1501

Answers (1)

anubhava
anubhava

Reputation: 785156

You may use this regex to capture folder and filename in 2 separate capture groups:

(?:\\\\[^\\]+|[a-zA-Z]:)((?:\\[^\\]+)+\\)?([^<>:]*)

RegEx Demo

RegEx Details:

  • (?:\\\\[^\\]+|[a-zA-Z]:): Match either a server name or IP address that starts with \\ followed by 1+ non-\ characters OR a drive letter followed by a : in a non-capturing group
  • ((?:\\[^\\]+)+\\)?: 1st capture group for folder path that matches a string starting with a \ and matches 1+ non-\ characters allowing multiple occurrences of that followed by a \. This group is optional due to presence of ? in the end.
  • ([^<>:]*): Match filename that 0 or more of any character that is not <, > and :

Upvotes: 1

Related Questions