Darren Rose
Darren Rose

Reputation: 179

Excluding duplicates from a string in results

I am trying to amend this regex so that it does not match duplicates.

Current regex:

[\""].+?[\""]|[^ ]+

Sample string:

".doc" "test.xls", ".doc","me.pdf", "test file.doc"

Expected results:

".doc"
"test.xls"
"me.pdf"

But not

".doc"
"test.xls"
".doc"
"me.pdf"

Note:

  1. Filenames could potentially have spaces e.g. test file.doc
  2. items could be separated by a space or a comma or both
  3. strings could have quotes around or NOT have quotes around e.g. .doc or ".doc".

Upvotes: 4

Views: 184

Answers (1)

Wiktor Stribiżew
Wiktor Stribiżew

Reputation: 626960

In C#, you may use a simple regex to extract all valid matches and use .Distinct() to only keep unique values.

The regex is simple:

"(?<ext>[^"]+)"|(?<ext>[^\s,]+)

See the regex demo, you only need Group "ext" values.

Details

  • "(?<ext>[^"]+)" - ", (group "ext") any 1+ chars other than " and then "
  • | - or
  • (?<ext>[^\s,]+) - (group "ext") 1+ chars other than whitespace and comma

The C# code snippet:

var text = "\".doc\" \"test.xls\", \".doc\",\"me.pdf\", \"test file.doc\".doc \".doc\"";
Console.WriteLine(text); // => ".doc" "test.xls", ".doc","me.pdf", "test file.doc".doc ".doc"
var pattern = "\"(?<ext>[^\"]+)\"|(?<ext>[^\\s,]+)";
var results = Regex.Matches(text, pattern)
        .Cast<Match>()
        .Select(x => x.Groups["ext"].Value)
        .Distinct();
Console.WriteLine(string.Join("\n", results));

Output:

.doc
test.xls
me.pdf
test file.doc

Upvotes: 1

Related Questions