Reputation: 459
I need to know if the words are monosyllabic or polysyllabic. The way I am going to apply to find out is by counting the number of blocks of vowels.
I tried with this regex, but does not work well with all the words
number_of_vowels=match($1,"[aouöüeiáóúőűéí]?[aouöüeiáóúőűéí]");
In
könyvtaár
könyvter
hozzászóles
mű
cikk
ős
Desired output
könyvtaár 2 polysyllabic
könyvter 2 polysyllabic
hozzászóles 4 polysyllabic
mű 1 monosyllabic
cikk 1 monosyllabic
ős 1 monosyllabic
Now I'm using this regex
a=match($1,"[aouöüeiáóúőűéí]+");
And for this word "hozzászóles" it's giving me 2, not 4.
For more information, these are de consonats "b c cs d dz dzs f g gy h j k l ly m n ny p q r s sz t ty v w x y z zs"
Upvotes: 0
Views: 217
Reputation: 785481
You can use this awk command:
awk -F '[aouöüeiáóúőűéí]+' 'NF{
print $0, NF-1, (NF>2) ? "polysyllabic" : "monosyllabic"}' file | column -t
Output:
könyvtaár 2 polysyllabic
könyvter 2 polysyllabic
hozzászóles 4 polysyllabic
mű 1 monosyllabic
cikk 1 monosyllabic
ős 1 monosyllabic
Upvotes: 2
Reputation: 203985
If you want to use an awk function to count occurrences of a regep (e.g. if it's part of a larger script) then you need to use split()
or gsub()
, not match()
:
$ awk '{a=split($0,t,/[aouöüeiáóúőűéí]+/); print $0, a-1, (a>2?"poly":"mono")"syllabic"}' file
könyvtaár 2 polysyllabic
könyvter 2 polysyllabic
hozzászóles 4 polysyllabic
mű 1 monosyllabic
cikk 1 monosyllabic
ős 1 monosyllabic
$ awk '{t=$0; a=gsub(/[aouöüeiáóúőűéí]+/,"",t); print $0, a, (a>1?"poly":"mono")"syllabic"}' file
könyvtaár 2 polysyllabic
könyvter 2 polysyllabic
hozzászóles 4 polysyllabic
mű 1 monosyllabic
cikk 1 monosyllabic
ős 1 monosyllabic
but if you don't need a function to do it then just use @anubhava's approach.
Upvotes: 0