Michel Feinstein
Michel Feinstein

Reputation: 14276

How to get same Regex matches combined?

If I want to find all the text inside brackets in a string using a regex, I would have something like this:

string text = "[the] [quick] brown [fox] jumps over [the] lazy dog";
Regex regex = new Regex(@"\[([^]]+)\]");
MatchCollection matches = regex.Matches(text);

foreach (Match match in matches)
{
    ... // Here is my problem!
}

I am not sure how to continue my code from here, if I just iterate through all matches, I will get "the", "quick", "fox" and "the", I was expecting to get the two the grouped in the same Match.Group, just at different indexes.

What I really want is to get the two "the" grouped in such a way I can find all occurrences of the same word and their indexes.

I was hoping the API will give me something like this:

foreach (Match match in matches)
{   
    for (int i = 1; i < match.Groups.Count; i++)
    {
        StartIndexesList.Add(match.Groups[i].Index);
    }
}

Where each match.Group will hold a reference to the same occurrence in the text of some found token, so I expected this code will add all the the text index references to a list at once, but it doesn't, it just adds for each separate occurrence, not all at once.

How can I achieve this without post processing all the tokens to see if there are repeated ones?

Upvotes: 0

Views: 38

Answers (1)

Wagner DosAnjos
Wagner DosAnjos

Reputation: 6374

Is this what you are looking for?

string text = "[the] [quick] brown [fox] jumps over [the] lazy dog";
Regex regex = new Regex(@"\[([^]]+)\]");
MatchCollection matches = regex.Matches(text);

foreach (IGrouping<string, Match> group in matches.Cast<Match>().GroupBy(_ => _.Value))
{
    Console.WriteLine(group.Key);   // This will print '[the]'

    foreach (Match match in group)  // It will iterate through all matches of '[the]'
    {
        // do your stuff
    }
}

Upvotes: 1

Related Questions