Reputation: 179
I am trying to amend this regex so that it does not match duplicates.
Current regex:
[\""].+?[\""]|[^ ]+
Sample string:
".doc" "test.xls", ".doc","me.pdf", "test file.doc"
Expected results:
".doc"
"test.xls"
"me.pdf"
But not
".doc"
"test.xls"
".doc"
"me.pdf"
Note:
test file.doc
.doc
or ".doc"
.Upvotes: 4
Views: 184
Reputation: 626960
In C#, you may use a simple regex to extract all valid matches and use .Distinct()
to only keep unique values.
The regex is simple:
"(?<ext>[^"]+)"|(?<ext>[^\s,]+)
See the regex demo, you only need Group "ext" values.
Details
"(?<ext>[^"]+)"
- "
, (group "ext") any 1+ chars other than "
and then "
|
- or(?<ext>[^\s,]+)
- (group "ext") 1+ chars other than whitespace and commaThe C# code snippet:
var text = "\".doc\" \"test.xls\", \".doc\",\"me.pdf\", \"test file.doc\".doc \".doc\"";
Console.WriteLine(text); // => ".doc" "test.xls", ".doc","me.pdf", "test file.doc".doc ".doc"
var pattern = "\"(?<ext>[^\"]+)\"|(?<ext>[^\\s,]+)";
var results = Regex.Matches(text, pattern)
.Cast<Match>()
.Select(x => x.Groups["ext"].Value)
.Distinct();
Console.WriteLine(string.Join("\n", results));
Output:
.doc
test.xls
me.pdf
test file.doc
Upvotes: 1