Dev Dog
Dev Dog

Reputation: 75

Searching for multiple matches on one line using Grep and Regex

I'm trying to use Grep with wc -l to print out the number of words in a text file that have 3 or more vowels in a row.

Right now, I'm inputting:

grep -i -E '<\.*[aeiou]{3}.*\>' file.txt | wc -l

but this is not returning the correct number of words, because on some lines there are multiple words that have 3 vowels in a row.

if file.txt contains this :

beautiful courteous 
beautiful 
courteous

my desired output would be 4, rather than 3, and currently I'm only able to get 3.

I've been looking online for a while for a solution but I just can't seem to figure it out. Can anyone assist?

Upvotes: 5

Views: 7481

Answers (2)

John1024
John1024

Reputation: 113834

To get each matching word on a separate line, use the -o option:

$ grep -iEo '[[:alnum:]]*[aeiou]{3}[[:alnum:]]*' file.txt
beautiful
courteous
beautiful
courteous
$ grep -iEo '[[:alnum:]]*[aeiou]{3}[[:alnum:]]*' file.txt | wc -l
4

[[:alnum:]]*[aeiou]{3}[[:alnum:]]* matches words with three consecutive vowels. -o assures that each word is on a separate line.

If you want to be stricter about the definition of a word, you may want instead to use [[:alpha:]]*[aeiou]{3}[[:alpha:]]*.

Documentation

From man grep:

-o, --only-matching
Print only the matched (non-empty) parts of a matching line, with each such part on a separate output line.

Discussion

Consider:

\<.*[aeiou]{3}.*\>'

In the above, note that . matches any character and .* is greedy: it matches the longest possible match. Thus, \<.*[aeiou]{3} will match from the beginning of the first word on a line to the last occurrence on the line of three vowels in a row. The final .*\> will match from there to the end of the last word on the line. This is not what you need.

Upvotes: 6

yorammi
yorammi

Reputation: 6458

You should do it in 2 steps...

First you split the file into words:

tr -s '[[:punct:][:space:]]' '\n' < file.txt > wordsFile.txt

and then you count the matching words:

grep -i -E '.*[aeiou]{3}.*' wordsFile.txt | wc -l

Upvotes: 0

Related Questions