myusrn
myusrn

Reputation: 1110

Regex including what is supposed to be non-capturing group in result

I have the following simple test where i'm trying to get the Regex pattern such that it yanks the executable name without the ".exe" suffix.
 
It appears my non-capturing group setting (?:\\.exe) isn't working or i'm misunderstanding how its intended to work.
 
Both regex101 and regexstorm.net show the same result and the former confirms that "(?:\.exe)" is a non-capturing match.
 
Any thoughts on what i'm doing wrong?

// test variable for what i would otherwise acquire from Environment.CommandLine
var testEcl = "\"D:\\src\\repos\\myprj\\bin\\Debug\\MyApp.exe\" /?"
var asmName = Regex.Match(testEcl, @"[^\\]+(?:\.exe)", RegexOptions.IgnoreCase).Value;
// expecting "MyApp" but I get "MyApp.exe"

I have been able to extract the value i wanted by using a matching pattern with group names defined, as shown in the following, but would like to understand why non-capturing group setting approach didn't work the way i expected it to.

// test variable for what i would otherwise acquire from Environment.CommandLine
var testEcl = "\"D:\\src\\repos\\myprj\\bin\\Debug\\MyApp.exe\" /?"
var asmName = Regex.Match(Environment.CommandLine, @"(?<fname>[^\\]+)(?<ext>\.exe)", 
    RegexOptions.IgnoreCase).Groups["fname"].Value;
// get the desired "MyApp" result

/eoq

Upvotes: 5

Views: 4402

Answers (3)

Wiktor Stribiżew
Wiktor Stribiżew

Reputation: 626845

A (?:...) is a non-capturing group that matches and still consumes the text. It means the part of text this group matches is still added to the overall match value.

In general, if you want to match something but not consume, you need to use lookarounds. So, if you need to match something that is followed with a specific string, use a positive lookahead, (?=...) construct:

some_pattern(?=specific string) // if specific string comes immmediately after pattern
some_pattern(?=.*specific string) // if specific string comes anywhere after pattern

If you need to match but "exclude from match" some specific text before, use a positive lookbehind:

(?<=specific string)some_pattern // if specific string comes immmediately before pattern
(?<=specific string.*?)some_pattern // if specific string comes anywhere before pattern

Note that .*? or .* - that is, patterns with *, +, ?, {2,} or even {1,3} quantifiers - in lookbehind patterns are not always supported by regex engines, however, C# .NET regex engine luckily supports them. They are also supported by Python PyPi regex module, Vim, JGSoft software and now by ECMAScript 2018 compliant JavaScript environments.

In this case, you may capture what you need to get and just match the context without capturing:

var testEcl = "\"D:\\src\\repos\\myprj\\bin\\Debug\\MyApp.exe\" /?";
var asmName = string.Empty; 
var m = Regex.Match(testEcl, @"([^\\]+)\.exe", RegexOptions.IgnoreCase);
if (m.Success)
{
    asmName = m.Groups[1].Value;
}
Console.WriteLine(asmName);

See the C# demo

Details

  • ([^\\]+) - Capturing group 1: one or more chars other than \
  • \. - a literal dot
  • exe - a literal exe substring.

Since we are only interested in capturing group #1 contents, we grab m.Groups[1].Value, and not the whole m.Value (that contains .exe).

Upvotes: 9

Joe Sewell
Joe Sewell

Reputation: 6610

You're using a non-capturing group. The emphasis is on the word group here; the group does not capture the .exe, but the regex in general still does.

You're probably wanting to use a positive lookahead, which just asserts that the string must meet a criteria for the match to be valid, though that criteria is not captured.

In other words, you want (?=, not (?:, at the start of your group.

The former is only if you are enumerating the Groups property of the Match object; in your case, you're just using the Value property, so there's no distinction between a normal group (\.exe) and a non-capturing group (?:\.exe).

To see the distinction, consider this test program:

static void Main(string[] args)
{
    var positiveInput = "\"D:\\src\\repos\\myprj\\bin\\Debug\\MyApp.exe\" /?";
    Test(positiveInput, @"[^\\]+(\.exe)");
    Test(positiveInput, @"[^\\]+(?:\.exe)");
    Test(positiveInput, @"[^\\]+(?=\.exe)");

    var negativeInput = "\"D:\\src\\repos\\myprj\\bin\\Debug\\MyApp.dll\" /?";
    Test(negativeInput, @"[^\\]+(?=\.exe)");
}

static void Test(String input, String pattern)
{
    Console.WriteLine($"Input: {input}");
    Console.WriteLine($"Regex pattern: {pattern}");

    var match = Regex.Match(input, pattern, RegexOptions.IgnoreCase);

    if (match.Success)
    {
        Console.WriteLine("Matched: " + match.Value);
        for (int i = 0; i < match.Groups.Count; i++)
        {
            Console.WriteLine($"Groups[{i}]: {match.Groups[i]}");
        }
    }
    else
    {
        Console.WriteLine("No match.");
    }
    Console.WriteLine("---");
}

The output of this is:

Input: "D:\src\repos\myprj\bin\Debug\MyApp.exe" /?
Regex pattern: [^\\]+(\.exe)
Matched: MyApp.exe
Groups[0]: MyApp.exe
Groups[1]: .exe
---
Input: "D:\src\repos\myprj\bin\Debug\MyApp.exe" /?
Regex pattern: [^\\]+(?:\.exe)
Matched: MyApp.exe
Groups[0]: MyApp.exe
---
Input: "D:\src\repos\myprj\bin\Debug\MyApp.exe" /?
Regex pattern: [^\\]+(?=\.exe)
Matched: MyApp
Groups[0]: MyApp
---
Input: "D:\src\repos\myprj\bin\Debug\MyApp.dll" /?
Regex pattern: [^\\]+(?=\.exe)
No match.
---

The first regex (@"[^\\]+(\.exe)") has \.exe as just a normal group. When we enumerate the Groups property, we see that .exe is indeed a group captured in our input. (Note that the entire regex is itself a group, hence Groups[0] is equal to Value).

The second regex (@"[^\\]+(?:\.exe)") is the one provided in your question. The only difference compared to the previous scenario is that the Groups property doesn't contain .exe as one of its entries.

The third regex (@"[^\\]+(?=\.exe)") is the one I'm suggesting you use. Now, the .exe part of the input isn't captured by the regex at all, but a regex won't match a string unless it ends in .exe, as the fourth scenario illustrates.

Upvotes: 3

marvel308
marvel308

Reputation: 10458

It would match the non capturing group but won't capture it, so if you want the non captured part you should access the capture group instead of the whole match

you can access groups in

var asmName = Regex.Match(testEcl, @"([^\\]+)(?:\.exe)", RegexOptions.IgnoreCase);
asmName.Groups[1].Value

the demo for the regex can be found here

Upvotes: 1

Related Questions