Reputation: 75
I'm trying to use Grep
with wc -l
to print out the number of words in a text file that have 3 or more vowels in a row.
Right now, I'm inputting:
grep -i -E '<\.*[aeiou]{3}.*\>' file.txt | wc -l
but this is not returning the correct number of words, because on some lines there are multiple words that have 3 vowels in a row.
if file.txt contains this :
beautiful courteous
beautiful
courteous
my desired output would be 4, rather than 3, and currently I'm only able to get 3.
I've been looking online for a while for a solution but I just can't seem to figure it out. Can anyone assist?
Upvotes: 5
Views: 7481
Reputation: 113834
To get each matching word on a separate line, use the -o
option:
$ grep -iEo '[[:alnum:]]*[aeiou]{3}[[:alnum:]]*' file.txt
beautiful
courteous
beautiful
courteous
$ grep -iEo '[[:alnum:]]*[aeiou]{3}[[:alnum:]]*' file.txt | wc -l
4
[[:alnum:]]*[aeiou]{3}[[:alnum:]]*
matches words with three consecutive vowels. -o
assures that each word is on a separate line.
If you want to be stricter about the definition of a word, you may want instead to use [[:alpha:]]*[aeiou]{3}[[:alpha:]]*
.
From man grep
:
-o, --only-matching
Print only the matched (non-empty) parts of a matching line, with each such part on a separate output line.
Consider:
\<.*[aeiou]{3}.*\>'
In the above, note that .
matches any character and .*
is greedy: it matches the longest possible match. Thus, \<.*[aeiou]{3}
will match from the beginning of the first word on a line to the last occurrence on the line of three vowels in a row. The final .*\>
will match from there to the end of the last word on the line. This is not what you need.
Upvotes: 6
Reputation: 6458
You should do it in 2 steps...
First you split the file into words:
tr -s '[[:punct:][:space:]]' '\n' < file.txt > wordsFile.txt
and then you count the matching words:
grep -i -E '.*[aeiou]{3}.*' wordsFile.txt | wc -l
Upvotes: 0