Reputation: 20048
I want to find words inside text, where word contains only preselected character set.
For example: I use regex to split on characters not in set and remove entries that are empty
Like:
string inp = @"~T!@#e$мудак%š^t<>is69&.,;((טעראָר))_+}{{男子}[죽은]ที่เดิน:?/Ök\|`'+*-¤=";
string reg[] = {"[^A-Za-zšžõäöüŠŽÕÄÖÜ]"};
foreach (string word in inp.Split(reg, StringSplitOptions.RemoveEmptyEntries))
Console.Write(word + " ");
Output, that I am trying to get, is:
T e š t is Ök
Upvotes: 2
Views: 654
Reputation: 354576
You want Regex.Split(String, String) instead of String.Split(String[], StringSplitOptions) – the latter does no regex matching.
Kind of like the following (tested):
string inp = @"~T!@#e$мудак%š^t<>is69&.,;((טעראָר))_+}{{男子}[죽은]ที่เดิน:?/Ök\|`'+*-¤=";
string reg = "[^A-Za-zšžõäöüŠŽÕÄÖÜ]";
foreach (string word in Regex.Split(inp, reg))
if (word != string.Empty)
Console.Write(word + " ");
PowerShell test:
PS> $inp = '~T!@#e$мудак%š^t<>is69&.,;((טעראָר))_+}{{男子}[죽은]ที่เดิน:?/Ök\|`''+*-¤='
PS> $inp -split '[^A-Za-zšžõäöüŠŽÕÄÖÜ]' -join ' '
T e š t is Ök
Obviously you need to filter out the empty strings, so either
PS> $inp -split '[^A-Za-zšžõäöüŠŽÕÄÖÜ]' -ne '' -join ' '
T e š t is Ök
or
PS> $inp -split '[^A-Za-zšžõäöüŠŽÕÄÖÜ]+' -join ' '
T e š t is Ök
(although the latter still contains an empty item at the start ... ah well, I'll leave that to you.)
Upvotes: 6
Reputation: 55573
This is what you want (tested):
string inp = @"~T!@#e$мудак%š^t<>is69&.,;((טעראָר))_+}{{男子}[죽은]ที่เดิน:?/Ök\|`'+*-¤=";
Regex reg = new Regex("[^A-Za-zšžõäöüŠŽÕÄÖÜ]");
foreach (string s in reg.Split(inp))
{
if (String.IsNullOrEmpty(s))
continue;
Console.Write(s + " ");
}
Upvotes: 1