Reputation: 2002
I am trying to get some individual values from a string based on a format, now this format can change so ideally, I want to specify this using another string.
For example let's say my input is 1. Line One - Part Two (Optional Third Part)
I would want to specify the format as to match so %number%. %first% - %second% (%third%)
and then I want these values as variables.
Now the only way I could think of doing this was using RegEx groups and I have very nearly got RegEx works.
var input = "1. Line One - Part Two (Optional Third Part)";
var formatString = "%number%. %first% - %second% (%third%)";
var expression = new Regex("(?<Number>[^.]+). (?<First>[^-]+) - (?<Second>[^\\(]+) ((?<Third>[^)]+))");
var match = expression.Match(input);
Console.WriteLine(match.Groups["Number"].ToString().Trim());
Console.WriteLine(match.Groups["First"].ToString().Trim());
Console.WriteLine(match.Groups["Second"].ToString().Trim());
Console.WriteLine(match.Groups["Third"].ToString().Trim());
This results in the following output, so all good apart from that opening bracket.
1 Line One Part Two (Optional Third Part
I'm now a bit lost as to how I could translate my format string into a regular expression, now there are no rules on this format, but it would need to be fairly easy for a user.
Any advice is greatly appreciated, or perhaps there is another way not involving Regex?
Upvotes: 2
Views: 814
Reputation: 26926
Your format contains special characters that are becoming part of the regular expression. You can use the Regex.Escape
method to handle that. After that, you can just use a Regex.Replace
with a delegate to transform the format into a regular expression:
var input = "1. Line One - Part Two (Optional Third Part)";
var fmt = "%number%. %first% - %second% (%third%)";
var templateRE = new Regex(@"%([a-z]+)%", RegexOptions.Compiled);
var pattern = templateRE.Replace(Regex.Escape(fmt), m => $"(?<{m.Groups[1].Value}>.+?)");
var ansRE = new Regex(pattern);
var ans = ansRE.Match(input);
Note: You may want to place ^
and $
at the beginning and end of the pattern
respectively, to ensure the format must match the entire input string.
Upvotes: 0
Reputation: 37430
You included in your pattern couple of special characters (such as .
) without escaping them, so Regex does not match .
literlally.
Here's corrected code of yours:
using System.Text.RegularExpressions;
var input = "1. Line One - Part Two (Optional Third Part)";
var pattern = string.Format(
"(?<Number>{0})\\. (?<First>{1}) - (?<Second>{2}) \\((?<Third>{3})\\)",
"[^\\.]+",
"[^\\-]+",
"[^\\(]+",
"[^\\)]+");
var match = Regex.Match(input, pattern);
Console.WriteLine(match.Groups["Number"]);
Console.WriteLine(match.Groups["First"]);
Console.WriteLine(match.Groups["Second"]);
Console.WriteLine(match.Groups["Third"]);
If you want to keep you syntax, you can leverage Regex.Escape
method. I also written some code that parses all parameters within %
using System.Text.RegularExpressions;
var input = "1. Line One - Part Two (Optional Third Part)";
var formatString = "%number%. %first% - %second% (%third%)";
formatString = Regex.Escape(formatString);
var parameters = new List<string>();
formatString = Regex.Replace(formatString, "%([^%]+)%", match =>
{
var paramName = match.Groups[1].Value;
var groupPattern = "(?<" + paramName + ">{" + parameters.Count + "})";
parameters.Add(paramName);
return groupPattern;
});
var pattern = string.Format(
formatString,
"[^\\.]+",
"[^\\-]+",
"[^\\(]+",
"[^\\)]+");
var match = Regex.Match(input, pattern);
foreach (var paramName in parameters)
{
Console.WriteLine(match.Groups[paramName]);
}
Further notes
You need to adjust part where you specify pattern for each group, currently it's not generic and does not care about how many paramters there would be.
So finally, taking it all into account and cleaning up the code a little, you can use such solution:
public static class FormatBasedCustomRegex
{
public static string GetPattern(this string formatString,
string[] subpatterns,
out string[] parameters)
{
formatString = Regex.Escape(formatString);
formatString = formatString.ReplaceParams(out var @params);
if(@params.Length != subpatterns.Length)
{
throw new InvalidOperationException();
}
parameters = @params;
return string.Format(
formatString,
subpatterns);
}
private static string ReplaceParams(
this string formatString,
out string[] parameters)
{
var @params = new List<string>();
var outputPattern = Regex.Replace(formatString, "%([^%]+)%", match =>
{
var paramName = match.Groups[1].Value;
var groupPattern = "(?<" + paramName + ">{" + @params.Count + "})";
@params.Add(paramName);
return groupPattern;
});
parameters = @params.ToArray();
return outputPattern;
}
}
and main method would look like:
var input = "1. Line One - Part Two (Optional Third Part)";
var pattern = "%number%. %first% - %second% (%third%)".GetPattern(
new[]
{
"[^\\.]+",
"[^\\-]+",
"[^\\(]+",
"[^\\)]+",
},
out var parameters);
var match = Regex.Match(input, pattern);
foreach (var paramName in parameters)
{
Console.WriteLine(match.Groups[paramName]);
}
But it's up to you how would you define particular methods and what signatures they should have for you to have the best code :)
Upvotes: 2
Reputation: 785611
You may use this regex:
^(?<Number>[^.]+)\. (?<First>[^-]+) - (?<Second>[^(]+)(?: \((?<Third>[^)]+)\))?$
RegEx Details:
^
: Start(?<Number>[^.]+)
: Match and capture 1+ of any char that is not .
\.
: Match ". "
(?<First>[^-]+)
:-
: Match " - "
(?<Second>[^(]+)
: Match and capture 1+ of any char that is not (
(?:
: Start a non-capture group
\(
: Match space followed by (
(?<Third>[^)]+)
: Match and capture 1+ of any char that is not )
\)
: Match )
)?
: End optional non-capture group$
: EndUpvotes: 1