Yulian
Yulian

Reputation: 6769

Regex to match three words in cyrillic that could contain a hyphen

I'm trying to come up with a regex that matches 3 words in cyrillic that can contain also hyphens in C#.

Matches: "АБ АБ А", "А-Б А-Б А-Б", "А-Б-А АБ АБ", etc.

Doesn't match: "АБ АБ", "А-Б АБ", "АБ АБ-", etc.

So far I have a regex for cyrillic letters only: ^[\u0400-\u04FF]+$

Upvotes: 2

Views: 177

Answers (1)

Dmitrii Bychenko
Dmitrii Bychenko

Reputation: 186803

First of all, let's elaborate the rules:

  • Word must start from letter, end by letter
  • Word can contain any number of hyphens; each hyphen must be surrounded by letters (starting, dangling or doubled hyphens are not allowed)

so for the single word we have

  [\u0400-\u04FF](-?[\u0400-\u04FF]+)*

some examples:

  АБ      // correct
  АБ-А-АБ // correct (with hyphens)
  Z       // incorrect: non cyrillic letter
  -А      // incorrect: starting hyphen
  А-      // incorrect: dangling hyphen
  А--Б    // incorrect: double hyphen

now, for words: we want exactly three words separated by any number of white spaces \s:

  ^[\u0400-\u04FF](-?[\u0400-\u04FF]+)*(\s+[\u0400-\u04FF](-?[\u0400-\u04FF]+)*){2}$

Upvotes: 1

Related Questions