Reputation: 73
I'm trying to write a regular expression in Java that will match a word of n length that has a at least x number of vowels in it.
So far I've come up with the following:
// match words that are length 10 and have at least 2 vowels in them
(?=\w{10})(?:[^aeiou\W]*[aeiuo]){2}\w+
This seems to work but also matches words greater than length 10, i.e.:
wildernesses - matches
volatilizations - matches
voiceprint - matches (this should be the only match)
I would like it so that the length=10 constraint is enforced. I suspect that it may have something to do with the fact that I'm adding letters (the vowels) to the length of the string, but I'm not certain. Any help / guidance will be appreciated.
Upvotes: 3
Views: 2109
Reputation: 81
Try this out... (?<=\b|\p{Punct})(?:(?i)(?:aeiou{2,})|(?:a-z&&[^aeiou]{3,}))(?<=\w{10})
Tested this against sample data which seems to work. In my example, I've accounted for punctuation.
Upvotes: 0
Reputation: 425033
You can simplify greatly by using a simple lookahead (as a java String):
"(?i)\\b(?=([^aeiou ]*[aeiou]){2,})[a-z]{10}\\b"
Note that all other answers use \w
for letters, but \w
includes the underscore character, which is not a letter.
(?i)
turns on case insensitivity.
Upvotes: 2
Reputation: 97571
Use word boundaries, \b
, to prevent the match happening halfway through a word:
\b(?=\w{10}\b)(?:[^aeiou\W]*[aeiuo]){2,}[^aeiou\W]*\b
This will match:
wildernesses voiceprint volatilizations
Upvotes: 3