Firefly
Firefly

Reputation: 459

AWK - count the number of syllables in a word

I need to know if the words are monosyllabic or polysyllabic. The way I am going to apply to find out is by counting the number of blocks of vowels.

I tried with this regex, but does not work well with all the words number_of_vowels=match($1,"[aouöüeiáóúőűéí]?[aouöüeiáóúőűéí]");

In

könyvtaár
könyvter
hozzászóles
mű
cikk
ős

Desired output

könyvtaár    2    polysyllabic
könyvter    2     polysyllabic   
hozzászóles    4    polysyllabic
mű    1    monosyllabic
cikk    1    monosyllabic
ős    1    monosyllabic

Now I'm using this regex

a=match($1,"[aouöüeiáóúőűéí]+");

And for this word "hozzászóles" it's giving me 2, not 4.

For more information, these are de consonats "b c cs d dz dzs f g gy h j k l ly m n ny p q r s sz t ty v w x y z zs"

Upvotes: 0

Views: 217

Answers (2)

anubhava
anubhava

Reputation: 785481

You can use this awk command:

awk -F '[aouöüeiáóúőűéí]+' 'NF{
        print $0, NF-1, (NF>2) ? "polysyllabic" : "monosyllabic"}' file | column -t

Output:

könyvtaár    2  polysyllabic
könyvter     2  polysyllabic
hozzászóles  4  polysyllabic
mű           1  monosyllabic
cikk         1  monosyllabic
ős           1  monosyllabic

Upvotes: 2

Ed Morton
Ed Morton

Reputation: 203985

If you want to use an awk function to count occurrences of a regep (e.g. if it's part of a larger script) then you need to use split() or gsub(), not match():

$ awk '{a=split($0,t,/[aouöüeiáóúőűéí]+/); print $0, a-1, (a>2?"poly":"mono")"syllabic"}' file
könyvtaár 2 polysyllabic
könyvter 2 polysyllabic
hozzászóles 4 polysyllabic
mű 1 monosyllabic
cikk 1 monosyllabic
ős 1 monosyllabic

$ awk '{t=$0; a=gsub(/[aouöüeiáóúőűéí]+/,"",t); print $0, a, (a>1?"poly":"mono")"syllabic"}' file
könyvtaár 2 polysyllabic
könyvter 2 polysyllabic
hozzászóles 4 polysyllabic
mű 1 monosyllabic
cikk 1 monosyllabic
ős 1 monosyllabic

but if you don't need a function to do it then just use @anubhava's approach.

Upvotes: 0

Related Questions