Dan Dinu
Dan Dinu

Reputation: 33398

Regex -> only letters and end with a dot

I'm trying to select all the tokens that contain only letters or only letters and end with a dot.

Example of valid words : "abc", "abc."
Invalid "a.b" "a2"

i've tried this

string[] tokens = text.Split(' ');
var words = from token in tokens 
            where Regex.IsMatch(token,"^[a-zA-Z]+.?$")
            select token;

^[a-zA-Z]+ - only letters one or more times and start with letter

.?$ = ends with 0 or 1 dot ?? not sure about this

Upvotes: 3

Views: 16691

Answers (2)

Stelian Matei
Stelian Matei

Reputation: 11623

You need to escape .

^[a-zA-Z]+\.?$

Otherwise, . is a special character that matches (almost) all characters--not just periods.

Upvotes: 2

Douglas
Douglas

Reputation: 54887

In regex, an unescaped . pattern matches any character (including digits). Thus, your regex would undesirably match tokens such as "a2".

You need to escape your dot character as \..

string[] tokens = text.Split(' ');
var words = from token in tokens 
            where Regex.IsMatch(token,@"^[a-zA-Z]+\.?$")
            select token;

Edit: Furthermore, you can amalgamate your Split(' ') logic into your regex by using lookbehind and lookahead. This might improve efficiency, although it does reduce legibility a bit.

var words = Regex.Matches(text, @"(?<=\ |^)[a-zA-Z]+\.?(?=\ |$)")
                 .OfType<Match>()
                 .Select(m => m.Value);
  • The (?<=\ |^) lookbehind means that the match must be preceded by a space or start-of-string.
  • The (?=\ |$) lookahead means that the match must be succeeded by a space or end-of-string.

Upvotes: 7

Related Questions