Reputation: 4440
The following code
string expression = "(\\{[0-9]+\\})";
RegexOptions options = ((RegexOptions.IgnorePatternWhitespace | RegexOptions.Multiline) | RegexOptions.IgnoreCase);
Regex tokenParser = new Regex(expression, options);
MatchCollection matches = tokenParser.Matches("The {0} is a {1} and the {2} is also a {1}");
will match and capture "{0}", "{1}", "{2}" and "{1}".
Is it possible to change it (either the regular expression or option of the RegEx) so that it would match and capture "{0}", "{1}" and "{2}". In other words, each match should only be captured once?
Upvotes: 4
Views: 6894
Reputation: 75242
Here's something you could use for a pure regex solution:
Regex r = new Regex(@"(\{[0-9]+\}|\[[^\[\]]+\])(?<!\1.*\1)",
RegexOptions.Singleline);
But for the sake of both efficiency and maintainability, you're probably better off with a mixed solution like the one you posted.
Upvotes: 2
Reputation: 4440
Here is what I came up with.
private static bool TokensMatch(string t1, string t2)
{
return TokenString(t1) == TokenString(t2);
}
private static string TokenString(string input)
{
Regex tokenParser = new Regex(@"(\{[0-9]+\})|(\[.*?\])");
string[] tokens = tokenParser.Matches(input).Cast<Match>()
.Select(m => m.Value).Distinct().OrderBy(s => s).ToArray<string>();
return String.Join(String.Empty, tokens);
}
Note that the difference in the regular expression from the one in my question is due to the fact that I cater for two types of token; numbered ones delimited by {} and named ones delimited by [];
Upvotes: 5
Reputation: 64098
Regular expressions solve lots of problems, but not every problem. How about using other tools in the toolbox?
var parameters = new HashSet<string>(
matches.Select(mm => mm.Value).Skip(1));
Or
var parameters = matches.Select(mm => mm.Value).Skip(1).Distinct();
Upvotes: 3
Reputation: 598
If you only want one instance change
string expression = "(\\{[0-9]+\\})"; \\one or more repetitions
to
string expression = "(\\{[0-9]{1}})"; \\Exactly 1 repetition
Upvotes: -2