Reputation: 6769
I'm trying to come up with a regex that matches 3 words in cyrillic that can contain also hyphens in C#.
Matches: "АБ АБ А", "А-Б А-Б А-Б", "А-Б-А АБ АБ", etc.
Doesn't match: "АБ АБ", "А-Б АБ", "АБ АБ-", etc.
So far I have a regex for cyrillic letters only: ^[\u0400-\u04FF]+$
Upvotes: 2
Views: 177
Reputation: 186803
First of all, let's elaborate the rules:
- Word must start from letter, end by letter
- Word can contain any number of hyphens; each hyphen must be surrounded by letters (starting, dangling or doubled hyphens are not allowed)
so for the single word we have
[\u0400-\u04FF](-?[\u0400-\u04FF]+)*
some examples:
АБ // correct
АБ-А-АБ // correct (with hyphens)
Z // incorrect: non cyrillic letter
-А // incorrect: starting hyphen
А- // incorrect: dangling hyphen
А--Б // incorrect: double hyphen
now, for words: we want exactly three words separated by any number of white spaces \s
:
^[\u0400-\u04FF](-?[\u0400-\u04FF]+)*(\s+[\u0400-\u04FF](-?[\u0400-\u04FF]+)*){2}$
Upvotes: 1