John Kline Kurtz
John Kline Kurtz

Reputation: 855

Get Expression in Named Capture

I am providing a textbox for one to enter a Regular Expression to match filenames. I plan to detect any named capture groups that they provide with the Regex method GetGroupNames().

I want to get the expression that they entered inside each named capture group.

As an example, they might enter a regular expression like this:

December (?<FileYear>\d{4}) Records\.xlsx

Is there a method or means to get the sub-expression \d{4} apart from manually parsing the regular expression string?

Upvotes: 5

Views: 131

Answers (3)

John Kline Kurtz
John Kline Kurtz

Reputation: 855

Here is a solution using a regular expression to match the capturing groups in a regular expression. Idea is from this post Using RegEx to balance match parenthesis:

\(\?\<(?<MyGroupName>\w+)\>
(?<MyExpression>
((?<BR>\()|(?<-BR>\))|[^()]*)+
)
\)

or more concisely...

\(\?\<(?<MyGroupName>\w+)\>(?<MyExpression>((?<BR>\()|(?<-BR>\))|[^()]*)+)\)

and to use it might look like this:

string sGetCaptures = @"\(\?\<(?<MyGroupName>\w+)\>(?<MyExpression>((?<BR>\()|(?<-BR>\))|[^()]*)+)\)";
MatchCollection MC = Regex.Matches(txtFromUser.Text, sGetCaptures );
foreach (Match M in MC)
{
    string sGroupName = M.Groups["MyGroupName"].Value;
    string sSubExpression = M.Groups["MyExpression"].Value;
    //Do what I need to do with the sub-expression
    MessageBox.Show(sGroupName + ":" + sSubExpression);
}

And for the example in the original question, the message box would return FileYear:\d{4}

Upvotes: 0

ΩmegaMan
ΩmegaMan

Reputation: 31616

This pattern (?<=\(\?<\w+\>)([^)]+) will give you all the named match capture expression with the name of the capture. It uses a negative look behind to make sure the text matched will have a (?<...> before it.


string data = @"December (?<FileYear>\d{4}) Records\.xlsx";
string pattern = @"(?<=\(\?<\w+\>)([^)]+)";

Regex.Matches(data, pattern)
     .OfType<Match>()
     .Select(mt => mt.Groups[0].Value)

returns one item of

\d{4}

While the data such as (?<FileMonth>[^\s]+)\s+(?<FileYear>\d{4}) Records\.xlsx would return two matches:

[^\s]+

\d{4}

Upvotes: 0

John Kline Kurtz
John Kline Kurtz

Reputation: 855

Here is an ugly brute force extension for parsing without using another Regex to detect the subexpression (or subpattern):

    public static string GetSubExpression(this Regex pRegex, string pCaptureName)
    {
        string sRegex = pRegex.ToString();
        string sGroupText = @"(?<" + pCaptureName + ">";
        int iStartSearchAt = sRegex.IndexOf(sGroupText) + sGroupText.Length;
        string sRemainder = sRegex.Substring(iStartSearchAt);
        string sThis;
        string sPrev = "";
        int iOpenParenCount = 0;
        int iEnd = 0;
        for (int i = 0; i < sRemainder.Length; i++)
        {
            sThis = sRemainder.Substring(i, 1);
            if (sThis == ")" && sPrev != @"\" && iOpenParenCount == 0)
            {
                iEnd = i;
                break;
            }
            else if (sThis == ")" && sPrev != @"\")
            {
                iOpenParenCount--;
            }
            else if (sThis == "(" && sPrev != @"\")
            {
                iOpenParenCount++;
            }
            sPrev = sThis;
        }
        return sRemainder.Substring(0, iEnd);
    }

The usage looks like this:

    Regex reFromUser = new Regex(txtFromUser.Text);
    string[] asGroupNames = reFromUser.GetGroupNames();
    int iItsInt;
    foreach (string sGroupName in asGroupNames)
    {
        if (!Int32.TryParse(sGroupName, out iItsInt)) //don't want numbered groups
        {
            string sSubExpression = reParts.GetSubExpression(sGroupName);
            //Do what I need to do with the sub-expression
        }
    }

Now, if you would like to generate test or sample data, you can use the NuGet package called "Fare" in the following way after you get a sub-expression:

            //Generate test data for it
            Fare.Xeger X = new Fare.Xeger(sSubExpression);
            string sSample = X.Generate();

Upvotes: 1

Related Questions