Brady
Brady

Reputation: 395

Remove Duplicate Captures from List

I'm new with .NET and not so great with RegEx but with that said I have the following code-

    var p = GetAllMatches(lines, @"^\s+?([A-Z]{1,2}[0-9]{2}) : |: ([A-Z]{1,2}[0-9]{2})")
                        .SelectMany(m => m.Groups[1].Captures.Cast<Capture>().Select(c => c.Value).ToList())
                        .ToList();


    private static List<Match> GetAllMatches(List<string> lines, string pattern, RegexOptions options=RegexOptions.None)
    {
        return lines
            .Select(l => Regex.Match(l, pattern, options))
            .Where(m => m.Success)
            .ToList(); 
    }

...which, I believe, captures portions of a string that either start with " : " and are followed by 1 or 2 alpha characters and 2 numerals, or portions of a string that end with " :" and are preceded by 1 or 2 alpha characters and 2 numerals.

So, for example, it should capture "C61, C62, C61" in the following block of text-

blablablabla12345blablablabla12345blablablabla12345blablablabla12345blablablabla12345blablablabla12345blablablabla12345blablablabla12345blablablabla12345blablablabla12345blablablabla12345blablablabla12345blablablabla12345blablablabla12345blablablabla12345blablablabla12345

Main Storage : C61
C62 : 1215
C61 : 1785

blablablabla12345blablablabla12345blablablabla12345blablablabla12345blablablabla12345blablablabla12345blablablabla12345blablablabla12345blablablabla12345blablablabla12345blablablabla12345blablablabla12345blablablabla12345blablablabla12345blablablabla12345blablablabla12345

So far so good. My question is this- how do I make it so it only captures a specific match ONCE? So in the above example, I'd like it to ultimately spit out "C61, C62" rather than "C61, C62, C61". Is this possible with RegEx or should I manipulate the list after the RegEx is done with its capture? Either way, how would I approach it?

Thanks in advance for any help provided.

Upvotes: 2

Views: 177

Answers (2)

Mariano
Mariano

Reputation: 6511

@Nefarrii answered how to remove duplicates from a list, which is definitely what should be done here! It's faster, easier, cheaper, better.

I'll contribute to the Regex part in case you're wondering -Yes, it can be done.

You're already capturing each token, so all you need to do is use a lookahead to check if "it's not followed by the same text" (using a backreference).

Regex:

(?: : (?<portion>[A-Z]{1,2}[0-9]{2})|^\s*(?<portion>[A-Z]{1,2}[0-9]{2}) :)(?!.*(?: : \k<portion>|^\s*\k<portion> :))
       ^^^^^^^^^^                                                         ^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^
 (same group, with a name)           (negative lookahead: it's not followed by the text captured in group <portion>)
  • Use RegexOptions.Singleline | RegexOptions.Multiline
  • Notice I'm using named groups.


Code:

string input = "blablablabla12345b\nMain Storage : C61\nC62 : 1215\nC61 : 1785\nblablablabla12345blablablabla";

string pattern = @"(?: : (?<portion>[A-Z]{1,2}[0-9]{2})|^\s*(?<portion>[A-Z]{1,2}[0-9]{2}) :)(?!.*(?: : \k<portion>|^\s*\k<portion> :))";
MatchCollection matches = Regex.Matches(input, pattern, RegexOptions.Singleline | RegexOptions.Multiline);

foreach (Match match in matches)
{
     GroupCollection groups = match.Groups;
     Console.WriteLine( groups["portion"].Value );
}

ideone Demo

Upvotes: 3

Nefariis
Nefariis

Reputation: 3549

Try using

list.Distinct().ToList();

http://www.dotnetperls.com/remove-duplicates-list

Upvotes: 2

Related Questions