Santosh Kumar
Santosh Kumar

Reputation: 27875

How to match words with no vowel?

The world of vowel and around could be subjective, so I've these set of rules:

I have following string:

text = """line with every word a vowel
sntshk xx yy.
Okay zz fine."""

My try:

s = re.findall(r'[^aeiouAEIOU].*', text)
print(s)

Expectation:

['sntshk', 'xx', 'yy', 'zz']

Reality:

['line with every word a vowel', '\nsntshk xx yy.', '\nOkay zz fine.']

Related: Search all words with no vowels

Upvotes: 1

Views: 3433

Answers (5)

Chris
Chris

Reputation: 363

This works:

text = """line with every word a vowel
sntshk xx yy.
Okay zz fine."""
q = ''
s = text.split()
for i in range(len(s)):
    c = 0
    s[i] = s[i].strip('.')
    for c in range(len(s[i])):
        if (s[i])[c].lower() in 'aeiou':
            q += s[i]+' '
            break
print(q)

Upvotes: 0

Austin
Austin

Reputation: 26039

There is a pure Python way you can do this without any imports:

[x.strip('.') for x in text.split() if all(y.lower() not in 'aeiou' for y in x)]

Example:

text = """line with every word a vowel 
sntshk xx yy.
Okay zz fine."""

print([x.strip('.') for x in text.split() if all(y.lower() not in 'aeiou' for y in x)])
# ['sntshk', 'xx', 'yy', 'zz']

Upvotes: 1

Tim Biegeleisen
Tim Biegeleisen

Reputation: 521794

I would just target using the pattern \b[^AEIOU_0-9\W]+\b in case insensitive mode:

text = """line with every word a vowel
sntshk xx yy.
Okay zz fine."""

re.findall(r'\b[^AEIOU_0-9\W]+\b', text, flags=re.I)
print(s)

['sntshk', 'xx', 'yy', 'zz']

The pattern [^\W] in fact is a double negative, and means any word character. To this negative class we blacklist off vowels, digits, and underscore, leaving only consonants.

Upvotes: 2

CertainPerformance
CertainPerformance

Reputation: 370929

Use an ordinary character set composed of alphabetical characters, excluding the vowels, with word boundaries at each end:

(?i)\b[b-df-hj-np-tv-z]+\b

https://regex101.com/r/DqGuY1/1

  • (?i) - Case-insensitive match
  • \b - Word boundary
  • [b-df-hj-np-tv-z]+ - Repeat one or more of:
    • characters in the range of b-d, or f-h, or j-n, or p-t, or v-z
  • \b - Word boundary

More readably, but less elegantly, you could also use

(?i)\b(?:(?![eiou])[b-z])+\b

Upvotes: 2

Code Maniac
Code Maniac

Reputation: 37745

[^aeiouAEIOU]

This means match anything except aeiouAEIOU so it will match characters other than alphabets too which is not required as you want to get words only,

so simply match all the alphabets other than vowels

\b[bcdfghjklmnpqrstvwxyz]+\b

Regex Demo

Upvotes: 1

Related Questions