Reputation: 395
I'm new with .NET and not so great with RegEx but with that said I have the following code-
var p = GetAllMatches(lines, @"^\s+?([A-Z]{1,2}[0-9]{2}) : |: ([A-Z]{1,2}[0-9]{2})")
.SelectMany(m => m.Groups[1].Captures.Cast<Capture>().Select(c => c.Value).ToList())
.ToList();
private static List<Match> GetAllMatches(List<string> lines, string pattern, RegexOptions options=RegexOptions.None)
{
return lines
.Select(l => Regex.Match(l, pattern, options))
.Where(m => m.Success)
.ToList();
}
...which, I believe, captures portions of a string that either start with " : " and are followed by 1 or 2 alpha characters and 2 numerals, or portions of a string that end with " :" and are preceded by 1 or 2 alpha characters and 2 numerals.
So, for example, it should capture "C61, C62, C61" in the following block of text-
blablablabla12345blablablabla12345blablablabla12345blablablabla12345blablablabla12345blablablabla12345blablablabla12345blablablabla12345blablablabla12345blablablabla12345blablablabla12345blablablabla12345blablablabla12345blablablabla12345blablablabla12345blablablabla12345
Main Storage : C61
C62 : 1215
C61 : 1785blablablabla12345blablablabla12345blablablabla12345blablablabla12345blablablabla12345blablablabla12345blablablabla12345blablablabla12345blablablabla12345blablablabla12345blablablabla12345blablablabla12345blablablabla12345blablablabla12345blablablabla12345blablablabla12345
So far so good. My question is this- how do I make it so it only captures a specific match ONCE? So in the above example, I'd like it to ultimately spit out "C61, C62" rather than "C61, C62, C61". Is this possible with RegEx or should I manipulate the list after the RegEx is done with its capture? Either way, how would I approach it?
Thanks in advance for any help provided.
Upvotes: 2
Views: 177
Reputation: 6511
@Nefarrii answered how to remove duplicates from a list, which is definitely what should be done here! It's faster, easier, cheaper, better.
I'll contribute to the Regex part in case you're wondering -Yes, it can be done.
You're already capturing each token, so all you need to do is use a lookahead to check if "it's not followed by the same text" (using a backreference).
Regex:
(?: : (?<portion>[A-Z]{1,2}[0-9]{2})|^\s*(?<portion>[A-Z]{1,2}[0-9]{2}) :)(?!.*(?: : \k<portion>|^\s*\k<portion> :))
^^^^^^^^^^ ^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^
(same group, with a name) (negative lookahead: it's not followed by the text captured in group <portion>)
RegexOptions.Singleline | RegexOptions.Multiline
Code:
string input = "blablablabla12345b\nMain Storage : C61\nC62 : 1215\nC61 : 1785\nblablablabla12345blablablabla";
string pattern = @"(?: : (?<portion>[A-Z]{1,2}[0-9]{2})|^\s*(?<portion>[A-Z]{1,2}[0-9]{2}) :)(?!.*(?: : \k<portion>|^\s*\k<portion> :))";
MatchCollection matches = Regex.Matches(input, pattern, RegexOptions.Singleline | RegexOptions.Multiline);
foreach (Match match in matches)
{
GroupCollection groups = match.Groups;
Console.WriteLine( groups["portion"].Value );
}
Upvotes: 3
Reputation: 3549
Try using
list.Distinct().ToList();
http://www.dotnetperls.com/remove-duplicates-list
Upvotes: 2