Reputation: 33398
I'm trying to select all the tokens that contain only letters or only letters and end with a dot.
Example of valid words : "abc", "abc."
Invalid "a.b" "a2"
i've tried this
string[] tokens = text.Split(' ');
var words = from token in tokens
where Regex.IsMatch(token,"^[a-zA-Z]+.?$")
select token;
^[a-zA-Z]+
- only letters one or more times and start with letter
.?$
= ends with 0 or 1 dot ?? not sure about this
Upvotes: 3
Views: 16691
Reputation: 11623
You need to escape .
^[a-zA-Z]+\.?$
Otherwise, .
is a special character that matches (almost) all characters--not just periods.
Upvotes: 2
Reputation: 54887
In regex, an unescaped .
pattern matches any character (including digits). Thus, your regex would undesirably match tokens such as "a2"
.
You need to escape your dot character as \.
.
string[] tokens = text.Split(' ');
var words = from token in tokens
where Regex.IsMatch(token,@"^[a-zA-Z]+\.?$")
select token;
Edit: Furthermore, you can amalgamate your Split(' ')
logic into your regex by using lookbehind and lookahead. This might improve efficiency, although it does reduce legibility a bit.
var words = Regex.Matches(text, @"(?<=\ |^)[a-zA-Z]+\.?(?=\ |$)")
.OfType<Match>()
.Select(m => m.Value);
(?<=\ |^)
lookbehind means that the match must be preceded by a space or start-of-string. (?=\ |$)
lookahead means that the match must be succeeded by a space or end-of-string.Upvotes: 7