Antenka
Antenka

Reputation: 1529

Regexpression asp.net validator for a few words

I'm trying to create a validator for a string, that may contain 1-N words, which a separated with 1 whitespace (spaces only between words). I'm a newbie in a regex, so I feel a bit confused, cause my expression seem to be correct:

^[[a-zA-Z]+\s{1}]{0,}[a-zA-Z]+$

What am I doing wrong here? (it accepts only 2 words .. but I want it to accept 1+ words) Any help is greatly appreciated :)

Upvotes: 1

Views: 315

Answers (1)

Code Jockey
Code Jockey

Reputation: 6721

As often happens with someone beginning a new programming language or syntax, you're close, but not quite! The ^ and $ anchors are being used correctly, and the character classes [a-zA-Z] will match only letters (sounds right to me), but your repetition is a little off, and your grouping is not what you think it is - which is your primary problem.

^[[a-zA-Z]+\s{1}]{0,}[a-zA-Z]+$
 ^           ^^^^^^^^ 
 a           bbbacccc

It only matches two words because you effectively don't have any group repetition; this is because you don't really have any groups - only character classes. The simplest fix is to change the first [ and its matching end brace (marked by a's in the listing above) to parentheses:

^([a-zA-Z]+\s{1}){0,}[a-zA-Z]+$

This single change will make it work the way you expect! However, there a few recommendations and considerations I'd like to make.

First, for readability and code maintenance, use the single character repetition operators instead of repetition braces wherever possible. * repeats zero or more times, + repeats one or more times, and ? repeats 0 or one times (AKA optional). Your repetition curly braces are syntactically correct, and do what you intend them to, but one (marked by b's above) should be removed because it is redundant, and the other (marked by c's above) should be shortened to an asterisk *, as they have exactly the same meaning:

^([a-zA-Z]+\s)*[a-zA-z]+$

Second, I would recommend considering (depending upon your application requirements) the \w shorthand character class instead of the [a-zA-Z] character class, with the following considerations:

  • it matches both upper and lowercase letters
  • it does match more than letters (it matches digits 0-9 and the underscore as well)
  • it can often be configured to match non-English (unicode) letters for multi-lingual input

If any of these are unnecessary or undesirable, then you're on the right track!

On a side note, the character combination \b is a word-boundary assertion and is not needed for your case, as you will already begin and end where there are letters and letters only!

As for learning more about regular expressions, I would recommend Regular-Expressions.info, which has a wealth of info about regexes and the inner workings and quirks of the various implementations. I also use a tool called RegexBuddy to test and debug expressions.

Upvotes: 2

Related Questions