bwall
bwall

Reputation: 1060

Get the middle part of a filename using regex

I need a regex that can return up to 10 characters in the middle of a file name.

filename:                          returns:
msl_0123456789_otherstuff.csv ->   0123456789
msl_test.xml                  ->   test
anythingShort.w1             ->   anythingSh

I can capture the beginning and end for removal with the following regex:

Regex.Replace(filename, "(^msl_)|([.][[:alnum:]]{1,3}$)", string.Empty); *

but I also need to have only 10 characters when I am done.

Explanation of the regex above:

Upvotes: 2

Views: 316

Answers (2)

The fourth bird
The fourth bird

Reputation: 163362

Using replace with the alternation, removes either of the alternatives from the start and the end of the string, but it will also work when the extension is not present and does not take the number of chars into account in the middle.

If the file extension should be present you might use a capturing group and make msl_ optional at the beginning.

Then match 1-10 times a word character except the _ followed by matching optional word characters until the .

^(?:msl_)?([^\W_]{1,10})\w*\.[^\W_]{2,}$

.NET regex demo (Click on the table tab)


A bit broader match could be using \S instead of \w and match until the last dot:

^(?:msl_)?(\S{1,10})\S*\.[^\W_]{2,}$

See another regex demo | C# demo

string[] strings = {"msl_0123456789_otherstuff.csv", "msl_test.xml","anythingShort.w1", "123456testxxxxxxxx"};
string pattern = @"^(?:msl_)?(\S{1,10})\S*\.[^\W_]{2,}$";
foreach (String s in strings) {
    Match match = Regex.Match(s, pattern);
    if (match.Success)
    {
        Console.WriteLine(match.Groups[1]);
    }                            
}

Output

0123456789
test
anythingSh

Upvotes: 2

Wiktor Stribiżew
Wiktor Stribiżew

Reputation: 626893

Note [[:alnum:]] can't work in a .NET regex, because it does not support POSIX character classes. You may use \w (to match letters, digits, underscores) or [^\W_] (to match letters or digits).

You can use your regex and just keep the first 10 chars in the string:

new string(Regex.Replace(s, @"^msl_|\.\w{1,3}$","").Take(10).ToArray())

See the C# demo online:

var strings = new List<string> { "msl_0123456789_otherstuff.csv", "msl_test.xml", "anythingShort.w1" };
foreach (var s in strings) 
{
    Console.WriteLine("{0} => {1}", s, new string(Regex.Replace(s, @"^msl_|\.\w{1,3}$","").Take(10).ToArray()));
}

Output:

msl_0123456789_otherstuff.csv => 0123456789
msl_test.xml => test
anythingShort.w1 => anythingSh

Upvotes: 3

Related Questions