Reputation: 23
How can I find words with three or more vowels of the same kind with a regular expression using back referencing?
I'm searching in text with a 3-column tab format "Word+PoS+Lemma".
This is what I have so far:
ggrep -P -i --colour=always '^\w*([aeioueöäüèéà])\w*?\1\w*?\1\w*?\t' filename
However, this gives me words with three vowels but not of the same kind.
I'm confused, because I thought the back referencing would refer to the same vowel it found in the brackets? I solved this problem by changing the .*?
to \w*
.
Thanks for the help!
Upvotes: 2
Views: 539
Reputation: 16194
Your regex looks too complicated, not sure what you're trying to accomplish with the .*?
but the usage looks suspect. I'd use something like:
([aeioueöäüèéà])\1\1
i.e. match a vowel as a capture group, then say you need two more.
Didn't realise you wanted to allow other letters between vowels, just allow zero or more "word" letters between backreferences:
([aeioueöäüèéà])(\w*\1){2}
Upvotes: 2
Reputation: 88756
I suggest with GNU grep:
grep -E --colour=always -i '\b\w*([aeioueöäüèéà])(\w*\1){2,}\w*'
See: The Stack Overflow Regular Expressions FAQ
Upvotes: 1