Reputation: 21
I want to find consonant clusters with regex. An example of a cluster is mpl in examples.
To start, I filtered out all the vowels and replaced them with spaces. With vowels filtered out, examples is x mpl s.
How can I filter out the x and the s too?
Upvotes: 2
Views: 3501
Reputation: 8938
Since your working definition of "consonant cluster" is two or more consonants in succession, you can simply use the following pattern (case-insensitively if you want to handle capital consonants):
[bcdfghjklmnpqrstvwxyz]{2,}
[bcdfghjklmnpqrstvwxyz]
– a simple whitelist character class for consonants (i.e. that will only match a consonant){2,}
– two or more in successionYou can test the pattern against a couple input strings in a related regex fiddle.
Note that since vowels are "a, e, i, o, u, and sometimes y", I have included y
in the whitelist character class for consonants above.
You could drop y
and use...
[bcdfghjklmnpqrstvwxz]{2,}
...if you want to unconditionally treat y as a vowel rather than a consonant; but the rules for when y is a consonant are a bit more complicated than a simple regex will handle (basically requiring that you identify syllables first, then y's location within them).
Upvotes: 1
Reputation: 2393
Turning a comment into an answer…
As you changed vowels into white space: Search for \b.\b
(or \b\w\b
to target a bit better) and replace with a blank - to get rid of all isolated letters, leaving you with sequences of at least two.
Like RegEx101.
Upvotes: 0
Reputation: 174776
Seems like you want something like this,
(?:(?![aeiou])[a-z]){2,}
(?![aeiou])[a-z]
means choose any character from the lowercase alphabets but not of a
or e
or i
or o
or u
(?![aeiou])[a-z]
Matches a lowercase consonent
(?:(?![aeiou])[a-z]){2,}
two or more times.
Upvotes: 1