Reputation: 87
I'm trying to extract the cyrillic letters from a mixed input but can't get it to split the way I want. No numbers or special characters involved.
input = "я я я я я w w w w w w\nф ф ф ф ф v v v v v v"
output = re.split("![а-я]\s*", input)
print(output)
I want to get rid of the w
and v
letters and just print the Russian ones. With my code, input and output are the same except that they're in a list now.
Upvotes: 2
Views: 1904
Reputation: 627100
If you need to get all the Russian letters from your string, you need to use (?i)[А-ЯЁ]
regex (do not forget about Ё
as [А-Я]
range does not include it) and use it with re.findall
.
Tested in Python 3:
>>> import re
>>> input = "я я я я я w w w w w w\nф ф ф ф ф v v v v v v"
>>> output = re.findall(r'(?i)[А-ЯЁ]', input)
>>> print(output)
['я', 'я', 'я', 'я', 'я', 'ф', 'ф', 'ф', 'ф', 'ф']
To also extract Ukranian letters, you need to add ЇІЄҐ
to the character class:
output = re.findall(r"(?i)[А-ЯЁЇІЄҐ]", input)
An apostrophe is also considered a Ukrainan letter, no idea if you want to include it into the pattern.
Upvotes: 2