azizalizada
azizalizada

Reputation: 13

Vowels not at the end or start of the words in string

I am trying to find the words in string not starting or ending with letters 'aıoueəiöü'. But regex fails to find words when I use this code:

txt = "Nasa has fixed a problem with malfunctioning equipment on a new rocket designed to take astronauts to the Moon."

re.findall(r"\b[^aıoueəiöü]\w+[^aıoueəiöü]\b",txt)

Instead, it works fine when whitespace character \s is added in negation part:

re.findall(r"\b[^aıoueəiöü\s]\w+[^aıoueəiöü\s]\b",txt)

I cannot understand the issue in first example of code, why should I specify whitespace characters too?

Upvotes: 1

Views: 170

Answers (1)

Wiktor Stribiżew
Wiktor Stribiżew

Reputation: 627022

Note that [^aıoueəiöü] matches any char other than a, ı, o, u, e, ə, i, ö and ü. It can match a whitespace, a digit, punctuation, etc.

Also, you regex matches strings of at least three chars, you need to adjust it to match one and two char strings, too.

You do not have to rely on excluding whitespace from the pattern. Since you only want to match word chars other than vowels, add \W rather than \s:

\b[^\Waıoueəiöü](?:\w*[^\Waıoueəiöü])?\b

See the regex demo.

Details:

  • \b - a word boundary
  • [^\Waıoueəiöü] - any word char except a letter from the aıoueəiöü set
  • (?:\w*[^\Waıoueəiöü])? - an optional occurrence of
    • \w* - any zero or more word chars
    • [^\Waıoueəiöü] - any word char except a letter from the aıoueəiöü set
  • \b - a word boundary

Upvotes: 1

Related Questions