Reputation: 13
I am trying to find the words in string not starting or ending with letters 'aıoueəiöü'
. But regex fails to find words when I use this code:
txt = "Nasa has fixed a problem with malfunctioning equipment on a new rocket designed to take astronauts to the Moon."
re.findall(r"\b[^aıoueəiöü]\w+[^aıoueəiöü]\b",txt)
Instead, it works fine when whitespace character \s
is added in negation part:
re.findall(r"\b[^aıoueəiöü\s]\w+[^aıoueəiöü\s]\b",txt)
I cannot understand the issue in first example of code, why should I specify whitespace characters too?
Upvotes: 1
Views: 170
Reputation: 627022
Note that [^aıoueəiöü]
matches any char other than a
, ı
, o
, u
, e
, ə
, i
, ö
and ü
. It can match a whitespace, a digit, punctuation, etc.
Also, you regex matches strings of at least three chars, you need to adjust it to match one and two char strings, too.
You do not have to rely on excluding whitespace from the pattern. Since you only want to match word chars other than vowels, add \W
rather than \s
:
\b[^\Waıoueəiöü](?:\w*[^\Waıoueəiöü])?\b
See the regex demo.
Details:
\b
- a word boundary[^\Waıoueəiöü]
- any word char except a letter from the aıoueəiöü
set(?:\w*[^\Waıoueəiöü])?
- an optional occurrence of
\w*
- any zero or more word chars[^\Waıoueəiöü]
- any word char except a letter from the aıoueəiöü
set\b
- a word boundaryUpvotes: 1