Senne
Senne

Reputation: 183

.NET RegEx letters/signs/spaces

I have to allow all letters (including accents) = \w

I have to allow spaces = \s

I have to allow specific signs = [\-\/\.\;\\\,\:\+\(\)]

The only limitation is that these signs can appear max 3 times in the complete string.

At the moment I have this regex

^\w*([\-\/\.\;\\\,\:\+\(\)\s]{0,3}\w*){0,2}?$

But, this limits everything to a certain point. Can anyone support me to create the correct regex?

Upvotes: 2

Views: 46

Answers (1)

Wiktor Stribiżew
Wiktor Stribiżew

Reputation: 627469

It seems you do not need \w, you need \p{L} to match letters only. Note that \w matches [\p{L}\p{N}_], and is thus not suitable for this scenario.

Combine all into 1 pattern - ^[-\p{L}\s/.;\\,:+()]+$ - and restrict it with a (?!(?:[^-/.;\\,:+()]*[-/.;\\,:+()]){4}) negative lookahead anchored at the start disallowing 4 non-consecutive occurrences of these special symbols (so, allowing 0-3 occurrences):

\A(?!(?:[^-/.;\\,:+()]*[-/.;\\,:+()]){4})[-\p{L}\s/.;\\,:+()]+\z

See the regex demo

  • \A - start of string
  • (?!(?:[^-/.;\\,:+()]*[-/.;\\,:+()]){4}) - a negative lookahead that fails the match if its pattern matches:
    • (?:[^-/.;\\,:+()]*[-/.;\\,:+()]){4} - 4 sequences of:
      • [^-/.;\\,:+()]* - zero or more chars other than those defined in the set
      • [-/.;\\,:+()] - 1 char defined in the set
  • [-\p{L}\s/.;\\,:+()]+ - 1 or more chars defined in the character class
  • \z - the very end of string.

C# declaration using a verbatim string literal:

var pattern = @"\A(?!(?:[^-/.;\\,:+()]*[-/.;\\,:+()]){4})[-\p{L}\s/.;\\,:+()]+\z";

Another approach: use a non-capturing group and apply the {0,3} limiting quantifier to it:

\A[\p{L}\s]*(?:[-/.;\\,:+()][\p{L}\s]*){0,3}[\p{L}\s]*\z

See another regex demo. Note this expression will also match an empty string, to prevent it, replace the first or last [\p{L}\s]* with [\p{L}\s]+.

  • [\p{L}\s]* - matches 0+ letter or whitespace chars
  • (?:[-/.;\\,:+()][\w\s]*){0,3} - 0 to 3 occurrences of:
    • [-/.;\\,:+()] - 1 char from the set
    • [\p{L}\s]* - 0+ letter or whitespace chars
  • [\p{L}\s]* - matches 0+ word or whitespace chars

Upvotes: 2

Related Questions