Wojciech Szabowicz
Wojciech Szabowicz

Reputation: 4198

c# Regex get part of path using regex

I am using regex to get specyfic information from string. Value of string would look like:

\subpath1\subpath2\subpathn\4xxxx_2xxxx\filename.extension
//there can be many subpath and x is allways number, last part of path is allways number_number            
//and it starts with 4 and last part is allways files with extension
//so I want to exclude path for example 4xxxx_xxxx/path/file.extension

So far using regex I came up wityh this construction (?<=\)(4[0-9])_([0-9]).?." but:

Any suggestions on this one?

Upvotes: 1

Views: 367

Answers (4)

ΩmegaMan
ΩmegaMan

Reputation: 31616

Because the pattern 4XXX_2.... is unique, just search on that. All we have to do is look for a "\4", then just ignore the "\" in the final output. Here is one way:

 \\(?<PostUrl>4[^_]+_2.+)

will get what you need into a match. We are using "Named Captures" (?<{Name Here}> ) so the match structure has this information:

Match #0
                [0]:  \4xxxx_2xxxx\Extra\filename.extension
  ["PostUrl"] → [1]:  4xxxx_2xxxx\Extra\filename.extension
        →1 Captures:  4xxxx_2xxxx\Extra\filename.extension

So we can get the match "4xxxx_2xxxx\Extra\filename.extension" by either

myMatch.Groups["PostUrl"].Value.ToString() or myMatch.Groups[1].Value.ToString()


If there is a pureist out there that says, but there could be a proceeding "\4..." pattern, then specify the regex option RightToLeft to ensure that it finds the "last", "4X" pattern.

Upvotes: 0

Mong Zhu
Mong Zhu

Reputation: 23732

Here is an alternative approach:

string path = "subpath1/subpath2/subpathn/41234_23456/excludePath/filename.extension";
string importantDirectory = path.Split('/').First(x => Regex.IsMatch(x, @"4\d+_\d+"));
string fileName = Path.GetFileName(path);
string result = Path.Combine(importantDirectory, fileName);
Console.WriteLine(result);

41234_23456\filename.extension

Upvotes: 1

dovid
dovid

Reputation: 6472

A. 4 Numbers = [0-9]{4} OR \d{4} OR \d\d\d\d If the number can be short or long, use + for "one or more": \d+_\d+

B. The path delimiter in the example is a backslash, and in the comment example a slash. both of them need escap with a backslash before, use [\/\\] for all format.

C. if the file name must have an extension, the expression need one or more valid file character, dot, and again one or more valid file character. such as \w+\.\w+ use \b to ensure the end of a string/path.

Note that a valid file name varies from system to system (Mac or Windows for example), And is in any case wider than \w which includes only a-zA-Z0-9_.

My suggestin:

\d+_\d+[\/\\]\w+\.\w+\b

https://regex101.com/r/Ed2H0u/1

C# code:

    var textInput = @"
\subpath1\subpath2\subpathn\4123_21253\filename.extension
\subpath2\subpathn\4123_21253\subpathn\filename.extension
";

    var matches = Regex.Matches(textInput, @"\b[\w\/\\]+[\/\\](\d+_\d+)[\/\\](\w+\.\w+)\b");
    foreach (Match element in matches)
    {
        Console.WriteLine("Path: " + element.Value);
        Console.WriteLine("Number: " + element.Groups[1].Value);
        Console.WriteLine("FileName: " + element.Groups[2].Value);
    }

https://dotnetfiddle.net/V87CKc

Upvotes: 0

Wiktor Stribiżew
Wiktor Stribiżew

Reputation: 626871

You can use

(?<=\\)(4[0-9]*)_([0-9]*)\\[^\\]+\.\w+

See the regex demo.

Details:

  • (?<=\\) - a positive lookbehind that requires a \ char to appear immediately to the left of the current location
  • (4[0-9]*) - Group 1: 4 and then zero or more ASCII digits
  • _ - an underscore
  • ([0-9]*) - Group 2: any zero or more ASCII digits
  • \\ - a \ char
  • [^\\]+ - one or more chars other than \
  • \. - a dot
  • \w+ - one or more word chars.

Upvotes: 2

Related Questions