Jose Luis Roman
Jose Luis Roman

Reputation: 35

Regex first letter of each word Upper Case separated by spaces and 3 to 29 character long...C#

I recently was assigned an impossible task (in my estimation) to create a regex pattern in which I should be able to validate several words in the same sentence or textbox with the following guidelines:

Example: Joseph Gordon Levitt

This example is exactly 20 characters long, each name (or word) is longer than 3 characters, separated by spaces, and the first letter of each name (or word) is upper case.

I tried this regex pattern ^[A-Z]{1}[a-zA-Z\s]{3,20}$. It works for some strings, but not all.

Upvotes: 1

Views: 2205

Answers (2)

user2316116
user2316116

Reputation: 6814

One of options is this:

^(?!.{21})[A-Z][a-z]{2,}(\s[A-Z][a-z]{2,})*$

Demo: https://dotnetfiddle.net/oWjSI4

enter image description here

Upvotes: 1

Wiktor Stribiżew
Wiktor Stribiżew

Reputation: 627020

Let's walk through the requirements:

  • Each name/word has to have first letter upper case: Use \p{Lu}
  • Names/Words separated by spaces: Use \s+ (1 or more spaces) / \s (only single space)
  • Each name/word 3 characters long or more: Word pattern will thus be \p{Lu}\p{L}{2,} - starting with an uppercase and then having 2 or more letters
  • And the sentence or textbox text can't be longer than 20 characters: Use a positive lookahead right after ^ / \A (start of string): (?!.{21}) or (?=.{0,20}$).

The resulting regex will look like

^(?!.{21})\p{Lu}\p{L}{2,}(?:\s\p{Lu}\p{L}{2,})*$
^(?=.{0,20}$)\p{Lu}\p{L}{2,}(?:\s\p{Lu}\p{L}{2,})*$

Or, if there can be 1+ whitespaces between words

^(?!.{21})\p{Lu}\p{L}{2,}(?:\s+\p{Lu}\p{L}{2,})*$
^(?=.{0,20}$)\p{Lu}\p{L}{2,}(?:\s+\p{Lu}\p{L}{2,})*$

NOTE: If you ever test it against a string that can end with a \n, newline char, replace $ with \z.

See the regex demo.

Details

  • ^ - start of string
  • (?=.{0,20}$) - there must be 0 to 20 non-newline chars in the string till the end
  • \p{Lu} - an uppercase letter
  • \p{L}{2,} - two or more letters
  • (?:\s\p{Lu}\p{L}{2,})* - 0 or more repetitions of:
    • \s - a whitespace (or 1+ whitespaces if \s+ is used)
    • \p{Lu}\p{L}{2,} - an uppercase letter and then any two or more letters
  • $ - end of string (\z is the very end of the string).

Upvotes: 2

Related Questions