Reputation: 6259
I am searching for a RegularExpression to split a text in it words. I have tested
Regex.Split(text, @"\s+")
But this gives me for example for
this (is a) text. and
this
(is
a)
text
and
But I search for a solution, that gives me only the words - without the (, ), . etc. It should also split a text like
end.begin
in two words.
Upvotes: 0
Views: 1366
Reputation: 97696
You're probably better off matching the words rather than splitting.
If you use Split
(with \W
as Regexident suggested), then you could get an extra string at the beginning and end. For example, the input string (a b)
would give you four outputs: ""
, "a"
, "b"
, and another ""
, because you're using the (
and )
as separators.
What you probably want to do is just match the words. You can do that like this:
Regex.Matches(text, "\\w+").Cast<Match>().Select(match => match.Value)
Then you'll get just the words, and no extra empty strings at the beginning and end.
Upvotes: 0
Reputation: 20394
You can do:
var text = "this (is a) text. and";
// to replace unwanted characters with space
text = System.Text.RegularExpressions.Regex.Replace(text, "[(),.]", " ");
// to split the text with SPACE delimiter
var splitted = text.Split(null as char[], StringSplitOptions.RemoveEmptyEntries);
foreach (var token in splitted)
{
Console.WriteLine(token);
}
See this Demo
Upvotes: 0
Reputation: 29552
Try this:
Regex.Split(text, @"\W+")
\W
is the counterpart to \w
, which means alpha-numeric.
Upvotes: 5