Steve Crane
Steve Crane

Reputation: 4440

Preventing duplicate matches in RegEx

The following code

string expression = "(\\{[0-9]+\\})";
RegexOptions options = ((RegexOptions.IgnorePatternWhitespace | RegexOptions.Multiline) | RegexOptions.IgnoreCase);
Regex tokenParser = new Regex(expression, options);

MatchCollection matches = tokenParser.Matches("The {0} is a {1} and the {2} is also a {1}");

will match and capture "{0}", "{1}", "{2}" and "{1}".

Is it possible to change it (either the regular expression or option of the RegEx) so that it would match and capture "{0}", "{1}" and "{2}". In other words, each match should only be captured once?

Upvotes: 4

Views: 6894

Answers (4)

Alan Moore
Alan Moore

Reputation: 75242

Here's something you could use for a pure regex solution:

Regex r = new Regex(@"(\{[0-9]+\}|\[[^\[\]]+\])(?<!\1.*\1)",
                    RegexOptions.Singleline);

But for the sake of both efficiency and maintainability, you're probably better off with a mixed solution like the one you posted.

Upvotes: 2

Steve Crane
Steve Crane

Reputation: 4440

Here is what I came up with.

private static bool TokensMatch(string t1, string t2)
{
  return TokenString(t1) == TokenString(t2);
}

private static string TokenString(string input)
{
  Regex tokenParser = new Regex(@"(\{[0-9]+\})|(\[.*?\])");

  string[] tokens = tokenParser.Matches(input).Cast<Match>()
      .Select(m => m.Value).Distinct().OrderBy(s => s).ToArray<string>();

  return String.Join(String.Empty, tokens);
}

Note that the difference in the regular expression from the one in my question is due to the fact that I cater for two types of token; numbered ones delimited by {} and named ones delimited by [];

Upvotes: 5

user7116
user7116

Reputation: 64098

Regular expressions solve lots of problems, but not every problem. How about using other tools in the toolbox?

var parameters = new HashSet<string>(
    matches.Select(mm => mm.Value).Skip(1));

Or

var parameters = matches.Select(mm => mm.Value).Skip(1).Distinct();

Upvotes: 3

mcauthorn
mcauthorn

Reputation: 598

If you only want one instance change

string expression = "(\\{[0-9]+\\})"; \\one or more repetitions 

to

string expression = "(\\{[0-9]{1}})";  \\Exactly 1 repetition

Upvotes: -2

Related Questions