sgelena
sgelena

Reputation: 23

How do I find words with three or more vowels (of the same kind) with regex using back referencing?

How can I find words with three or more vowels of the same kind with a regular expression using back referencing?

I'm searching in text with a 3-column tab format "Word+PoS+Lemma".

This is what I have so far:

ggrep -P -i --colour=always '^\w*([aeioueöäüèéà])\w*?\1\w*?\1\w*?\t' filename

However, this gives me words with three vowels but not of the same kind. I'm confused, because I thought the back referencing would refer to the same vowel it found in the brackets? I solved this problem by changing the .*? to \w*.

Thanks for the help!

Upvotes: 2

Views: 539

Answers (3)

sseLtaH
sseLtaH

Reputation: 11237

Using grep

$ grep -E '(([aeioueöäüèéà])[^\2]*){3,}' input_file

Upvotes: -1

Sam Mason
Sam Mason

Reputation: 16194

Your regex looks too complicated, not sure what you're trying to accomplish with the .*? but the usage looks suspect. I'd use something like:

([aeioueöäüèéà])\1\1

i.e. match a vowel as a capture group, then say you need two more.

Didn't realise you wanted to allow other letters between vowels, just allow zero or more "word" letters between backreferences:

([aeioueöäüèéà])(\w*\1){2}

Upvotes: 2

Cyrus
Cyrus

Reputation: 88756

I suggest with GNU grep:

grep -E --colour=always -i '\b\w*([aeioueöäüèéà])(\w*\1){2,}\w*'

See: The Stack Overflow Regular Expressions FAQ

Upvotes: 1

Related Questions