Reputation: 5154
I have a String with some cyrillic words inside. Each starts with a capital letter.
var str = 'ХєлпМіПліз';
I have found this solution str.match(/[А-Я][а-я]+/g)
.
But it returns me ["Пл"]
insted of ["Хєлп", "Мі", "Пліз"]
. Seems like it doesn't recognize ukrainian letters('і', 'є'), only russian.
So, How do I have to change that regex to include ukrainian letters?
Upvotes: 11
Views: 15261
Reputation: 31
Only Ukrainian, without Russian
[бвгґджзклмнпрстфхцчшщйаеєиіїоуюяь]/gi
Upvotes: 3
Reputation: 146
works with Ukrainian letters 'i' and others
python
r's/[^а-яА-Я.!?]/./g+'
Upvotes: 2
Reputation: 518
[А-Я]
is not Cyrillic alphabet, it's just Russian!
Cyrillic is a writing system. It used in alphabets for many languages. (Like Latin: charset for West European languages, East European &c.)
To have both Russian and Ukrainian you'd get [А-ЯҐЄІЇ]
.
To add Belarisian: [А-ЯҐЄІЇЎ]
And for all Cyrillic chars (including Balcanian languages and Old Cyrillic), you can get it through Unicode subset class, like: \p{IsCyrillic}
[А-ЩЬЮЯҐЄІЇ]
or [А-ЩЬЮЯҐЄІЇа-щьюяґєії]
seems to be full Ukrainian alphabet of 33 letters in each case.
Apostrophe is not a letter, but occasionally included in alphabet, because it has an impact to the next vowel. Apostrophe is a part of the words, not divider. It may be displayed in a few ways:
27 "'" APOSTROPHE 60 "`" GRAVE ACCENT 2019 "’" RIGHT SINGLE QUOTATION MARK 2bc "ʼ" MODIFIER LETTER APOSTROPHE
and maybe some more.
Yes, it's a bit complicated with apostrophe. There is no common standard for it.
Upvotes: 38
Reputation: 3627
Ukranian alphabet has four different words from the cyrillic alphabet, such as: [і, є, ї, ґ], also it can contain a single quote inside
"ґуля, з'їсти, істота, Європа".match(/[а-яієїґ\']+/ig)
i
by the and will match the upper case, like with "Європа"
Upvotes: 9
Reputation: 1601
Use \p{Lu}
for uppercase match, \p{Ll}
for lowercase, or \p{L}
to match any letter
update: That works only for Java, not for JavaScript. Don't forget to include "apostrof", "ji" to your regexp
Upvotes: 12
Reputation: 1067
[А-Я][а-я]
really doesn't include ukranian letters.
While 'я' is \u044f
, 'є' is \u0454
and 'i' is \u0456
(\u0404
for Є ) . You should include them in regex by hand:
/[А-ЯЄI][а-яєi]+/g
Upvotes: 4
Reputation: 89574
The way to solve this is to look at the unicode table to determine the character ranges you need. If, for example, I use the pattern:
str.match(/[А-Я][а-яєі]+/g)
it works with your example string. (sorry i don't know ukrainian letters)
Upvotes: 4