Reputation: 4198
I am using regex to get specyfic information from string. Value of string would look like:
\subpath1\subpath2\subpathn\4xxxx_2xxxx\filename.extension
//there can be many subpath and x is allways number, last part of path is allways number_number
//and it starts with 4 and last part is allways files with extension
//so I want to exclude path for example 4xxxx_xxxx/path/file.extension
So far using regex I came up wityh this construction (?<=\)(4[0-9])_([0-9]).?." but:
Any suggestions on this one?
Upvotes: 1
Views: 367
Reputation: 31616
Because the pattern 4XXX_2....
is unique, just search on that. All we have to do is look for a "\4", then just ignore the "\" in the final output. Here is one way:
\\(?<PostUrl>4[^_]+_2.+)
will get what you need into a match. We are using "Named Captures" (?<{Name Here}> )
so the match structure has this information:
Match #0
[0]: \4xxxx_2xxxx\Extra\filename.extension
["PostUrl"] → [1]: 4xxxx_2xxxx\Extra\filename.extension
→1 Captures: 4xxxx_2xxxx\Extra\filename.extension
So we can get the match "4xxxx_2xxxx\Extra\filename.extension" by either
myMatch.Groups["PostUrl"].Value.ToString()
or myMatch.Groups[1].Value.ToString()
If there is a pureist out there that says, but there could be a proceeding "\4..." pattern, then specify the regex option RightToLeft
to ensure that it finds the "last", "4X" pattern.
Upvotes: 0
Reputation: 23732
Here is an alternative approach:
string path = "subpath1/subpath2/subpathn/41234_23456/excludePath/filename.extension";
string importantDirectory = path.Split('/').First(x => Regex.IsMatch(x, @"4\d+_\d+"));
string fileName = Path.GetFileName(path);
string result = Path.Combine(importantDirectory, fileName);
Console.WriteLine(result);
41234_23456\filename.extension
Upvotes: 1
Reputation: 6472
A.
4 Numbers = [0-9]{4}
OR \d{4}
OR \d\d\d\d
If the number can be short or long, use + for "one or more": \d+_\d+
B.
The path delimiter in the example is a backslash, and in the comment example a slash. both of them need escap with a backslash before, use [\/\\]
for all format.
C.
if the file name must have an extension, the expression need one or more valid file character, dot, and again one or more valid file character. such as \w+\.\w+
use \b
to ensure the end of a string/path.
Note that a valid file name varies from system to system (Mac or Windows for example),
And is in any case wider than \w
which includes only a-zA-Z0-9_
.
My suggestin:
\d+_\d+[\/\\]\w+\.\w+\b
https://regex101.com/r/Ed2H0u/1
C# code:
var textInput = @"
\subpath1\subpath2\subpathn\4123_21253\filename.extension
\subpath2\subpathn\4123_21253\subpathn\filename.extension
";
var matches = Regex.Matches(textInput, @"\b[\w\/\\]+[\/\\](\d+_\d+)[\/\\](\w+\.\w+)\b");
foreach (Match element in matches)
{
Console.WriteLine("Path: " + element.Value);
Console.WriteLine("Number: " + element.Groups[1].Value);
Console.WriteLine("FileName: " + element.Groups[2].Value);
}
https://dotnetfiddle.net/V87CKc
Upvotes: 0
Reputation: 626871
You can use
(?<=\\)(4[0-9]*)_([0-9]*)\\[^\\]+\.\w+
See the regex demo.
Details:
(?<=\\)
- a positive lookbehind that requires a \
char to appear immediately to the left of the current location(4[0-9]*)
- Group 1: 4
and then zero or more ASCII digits_
- an underscore([0-9]*)
- Group 2: any zero or more ASCII digits\\
- a \
char[^\\]+
- one or more chars other than \
\.
- a dot\w+
- one or more word chars.Upvotes: 2