Reputation: 616

Linux Ubuntu Bash - Find words containing more than 2 vowels using AWK regular expressions

I want to print all the words containing more than 2 vowels from a file using awk.

This is my code so far:

#!/bin/bash
cat $1 | awk '{   #Default file separator is space 
for (i=1;i<=NF;i++)  #for every word          
  {
  if ($i ~ /([aeiojy]){2,}/)            
    {
      print $i
    }
}}'

Regular expression is the problem

/([aeiojy]){2,}/) this is my actual idea, but it doesnt work.

Upvotes: 1

Answers (2)

anubhava

Reputation: 785481

You can use split function in awk:

awk -v RS=' ' 'split($0, a, /[aeiouAEIOU]/) > 2' file

-v RS=' ' will process each word separated by space as separate records.
split will return value greater than 2 if there are at least 2 vowels in the word.

Upvotes: 0

hek2mgl

Reputation: 158080

This should work with GNU grep:

grep -Poi '([^[:space:]]*?[aeiou]){3,}[^[:space:]]*' file

Options:

-P perl compatible regular expressions
-o output every match on a single line
-i case insensitive match

The regex:

(                start of subpattern
  [^[:space:]]*  zero or more arbitrary non whitespace characters
  ?              ungreedy quantifier for the previous expression (perl specific)
  [aeiou]        vowel
)                end of subpattern
{3,}             the previous expression appears 3 or more times
[^[:space:]]*    zero or more other characters until word boundary.

Btw, perl compatible regular expressions are actually not required here. With plain grep you can use:

grep -oi '\([^[:space:]aeiou]*[aeiou]\)\{3,\}[^[:space:]]*' file

Note: I've excluded punctuation in the above examples but it can be added if required.

Upvotes: 2

Linux Ubuntu Bash - Find words containing more than 2 vowels using AWK regular expressions

Answers (2)

Related Questions