Reputation: 855
I am providing a textbox for one to enter a Regular Expression to match filenames. I plan to detect any named capture groups that they provide with the Regex method GetGroupNames()
.
I want to get the expression that they entered inside each named capture group.
As an example, they might enter a regular expression like this:
December (?<FileYear>\d{4}) Records\.xlsx
Is there a method or means to get the sub-expression \d{4}
apart from manually parsing the regular expression string?
Upvotes: 5
Views: 131
Reputation: 855
Here is a solution using a regular expression to match the capturing groups in a regular expression. Idea is from this post Using RegEx to balance match parenthesis:
\(\?\<(?<MyGroupName>\w+)\>
(?<MyExpression>
((?<BR>\()|(?<-BR>\))|[^()]*)+
)
\)
or more concisely...
\(\?\<(?<MyGroupName>\w+)\>(?<MyExpression>((?<BR>\()|(?<-BR>\))|[^()]*)+)\)
and to use it might look like this:
string sGetCaptures = @"\(\?\<(?<MyGroupName>\w+)\>(?<MyExpression>((?<BR>\()|(?<-BR>\))|[^()]*)+)\)";
MatchCollection MC = Regex.Matches(txtFromUser.Text, sGetCaptures );
foreach (Match M in MC)
{
string sGroupName = M.Groups["MyGroupName"].Value;
string sSubExpression = M.Groups["MyExpression"].Value;
//Do what I need to do with the sub-expression
MessageBox.Show(sGroupName + ":" + sSubExpression);
}
And for the example in the original question, the message box would return FileYear:\d{4}
Upvotes: 0
Reputation: 31616
This pattern (?<=\(\?<\w+\>)([^)]+)
will give you all the named match capture expression with the name of the capture. It uses a negative look behind to make sure the text matched will have a (?<...>
before it.
string data = @"December (?<FileYear>\d{4}) Records\.xlsx";
string pattern = @"(?<=\(\?<\w+\>)([^)]+)";
Regex.Matches(data, pattern)
.OfType<Match>()
.Select(mt => mt.Groups[0].Value)
returns one item of
\d{4}
While the data such as (?<FileMonth>[^\s]+)\s+(?<FileYear>\d{4}) Records\.xlsx
would return two matches:
[^\s]+
\d{4}
Upvotes: 0
Reputation: 855
Here is an ugly brute force extension for parsing without using another Regex to detect the subexpression (or subpattern):
public static string GetSubExpression(this Regex pRegex, string pCaptureName)
{
string sRegex = pRegex.ToString();
string sGroupText = @"(?<" + pCaptureName + ">";
int iStartSearchAt = sRegex.IndexOf(sGroupText) + sGroupText.Length;
string sRemainder = sRegex.Substring(iStartSearchAt);
string sThis;
string sPrev = "";
int iOpenParenCount = 0;
int iEnd = 0;
for (int i = 0; i < sRemainder.Length; i++)
{
sThis = sRemainder.Substring(i, 1);
if (sThis == ")" && sPrev != @"\" && iOpenParenCount == 0)
{
iEnd = i;
break;
}
else if (sThis == ")" && sPrev != @"\")
{
iOpenParenCount--;
}
else if (sThis == "(" && sPrev != @"\")
{
iOpenParenCount++;
}
sPrev = sThis;
}
return sRemainder.Substring(0, iEnd);
}
The usage looks like this:
Regex reFromUser = new Regex(txtFromUser.Text);
string[] asGroupNames = reFromUser.GetGroupNames();
int iItsInt;
foreach (string sGroupName in asGroupNames)
{
if (!Int32.TryParse(sGroupName, out iItsInt)) //don't want numbered groups
{
string sSubExpression = reParts.GetSubExpression(sGroupName);
//Do what I need to do with the sub-expression
}
}
Now, if you would like to generate test or sample data, you can use the NuGet package called "Fare" in the following way after you get a sub-expression:
//Generate test data for it
Fare.Xeger X = new Fare.Xeger(sSubExpression);
string sSample = X.Generate();
Upvotes: 1