Reputation:
I am trying to break a word into different syllables in Prolog according to 2 different rules ..
rule 1: vowel-consonant-vowel (break word after second vowel)
rule 2: vowel-consonant-consonant-vowel (break word between the 2 consonant) , for example, calculator = cal-cula-tor ..
I already have the following code in Prolog, however, it only analyzes the first 3 or 4 letters of the word ..
I need it to process and analyze the entire word.
vowel(a).
vowel(e).
vowel(i).
vowel(o).
vowel(u).
consonant(L):- not(vowel(L)).
syllable(W, S, RW):-
atom_chars(W, [V1, C, V2|Tail]),
vowel(V1),
consonant(C),
vowel(V2),
!,
atomic_list_concat([V1, C, V2], S),
atomic_list_concat(Tail, RW).
syllable(W, S, RW):-
atom_chars(W, [V1, C, C2, V2|Tail]),
vowel(V1),
consonant(C),
consonant(C2),
vowel(V2),
!,
atomic_list_concat([V1, C, C2, V2], S),
atomic_list_concat(Tail, RW).
syllable(W, W, _).
break(W, B):-
syllable(W, B, ''), !.
break(W, B):-
syllable(W, S, RW),
break(RW, B2),
atomic_list_concat([S, '-', B2], B).
Upvotes: 2
Views: 732
Reputation:
I guess its time for a DCG push back solution. The push back is used in the second rule of break//1. It is to reflect that we look at four characters but only consume two characters:
vowel(a). vowel(e). vowel(i). vowel(o). vowel(u).
consonant(C) :- \+ vowel(C).
break([V1,C,V2]) -->
[V1,C,V2],
{vowel(V1), consonant(C), vowel(V2)}.
break([V1,C1]), [C2,V2] -->
[V1,C1,C2,V2],
{vowel(V1), consonant(C1), consonant(C2), vowel(V2)}.
syllables([L|R]) --> break(L), !, syllables(R).
syllables([[C|L]|R]) --> [C], syllables([L|R]).
syllables([[]]) --> [].
So the overall solution doesn't need some extra predicates such as append/3 or reverse/2. We have also placed a cut to prune the search, which can be done because of the character catchall in the second rule of syllables//1.
Here are some example runs:
Jekejeke Prolog 2, Laufzeitbibliothek 1.1.6
(c) 1985-2016, XLOG Technologies GmbH, Schweiz
?- set_prolog_flag(double_quotes, chars).
Ja
?- phrase(syllables(R), "calculator").
R = [[c,a,l],[c,u,l,a],[t,o,r]] ;
Nein
?- phrase(syllables(R), "kitchensink").
R = [[k,i,t,c,h,e,n],[s,i,n,k]] ;
Nein
P.S.: In some older draft standards this DCG technique was called "right-hand-context", and instead of the verb "push back", the verb "prefixing" was used. In a newer draft standard this is called "semicontext", and instead of the verb "push back", the verb "restoring" is used.
https://www.complang.tuwien.ac.at/ulrich/iso-prolog/dcgs/dcgsdraft-2015-11-10.pdf
Upvotes: 1
Reputation: 12992
I think you could write it more simply.Here is my implementation:
syllable( Input, Final_Word):-
atom_chars( Input, Char_list),
(split(Char_list, Word)-> atom_chars( Final_Word, Word);
Final_Word=Input).
split([],[]).
split([X,Y,Z|T],[X,Y,Z,'-'|T1]):-
vowel(X),vowel(Z),
atom_chars( Input, T),
syllable(Input,T2),
atom_chars( T2, T1).
split([X,Y,Z,W|T],[X,Y,'-',Z|T1]):-
vowel(X),\+vowel(Y),\+vowel(Z),vowel(W),
atom_chars( Input, [W|T]),
syllable(Input,T2),
atom_chars( T2, T1).
split([X|T],[X|T1]):- \+vowel(X),split(T,T1).
split/2 splits the word adding '-' where it could be added following the above rules you stated and returns a list to syllable. atom_chars/2
transforms the list to a word. If the word couldn't be split then the output is the input.
Example:
?- syllable(calculator,L).
L = 'calcu-lato-r'.
I'm don't understand why you wrote 'calculator = cal-cula-tor ' since it doesn't follows the rules stated, since "cal" is not vowel-constant-vowel but constant-vowel-constant and same for the rest of thr word...
Upvotes: 1
Reputation: 40768
First, a setting that makes it much more convenient to specify lists of characters, and which I recommend you use in your code if you process text a lot:
:- set_prolog_flag(double_quotes, chars).
Second, the data, represented in such a way that the definitions can be used in all directions:
vowel(a). vowel(e). vowel(i). vowel(o). vowel(u). consonant(C) :- maplist(dif(C), [a,e,i,o,u]).
For example:
?- consonant(C). dif(C, u), dif(C, o), dif(C, i), dif(C, e), dif(C, a).
whereas the version you posted incorrectly says that there is no consonant:
?- consonant(C). false.
The rules you outline are readily described in Prolog:
% rule 1: vowel-consonant-vowel (break after second vowel) rule([V1,C,V2|Rest], Bs0, Bs, Rest) :- vowel(V1), consonant(C), vowel(V2), reverse([V2,C,V1|Bs0], Bs). % rule 2: vowel-consonant-consonant-vowel (break between the consonants) rule([V1,C1,C2,V2|Rest], Bs0, Bs, [C2,V2|Rest]) :- vowel(V1), consonant(C1), consonant(C2), vowel(V2), reverse([C1,V1|Bs0], Bs). % alternative: no break at this position rule([L|Ls], Bs0, Bs, Rest) :- rule(Ls, [L|Bs0], Bs, Rest).
Exercise: Why am I writing [V2,C,V1|_]
instead of [V1,C,V2|...]
in the call of reverse/2
?
Now, it only remains to describe the list of resulting syllables. This is easy with dcg notation:
word_breaks([]) --> []. word_breaks([L|Ls]) --> [Bs], { rule([L|Ls], [], Bs, Rest) }, word_breaks(Rest). word_breaks([L|Ls]) --> [[L|Ls]].
Now the point: Since this program is completely pure and does not incorrectly commit prematurely, we can use it to show that there are also other admissible hyphenations:
?- phrase(word_breaks("calculator"), Hs). Hs = [[c, a, l], [c, u, l, a], [t, o, r]] ; Hs = [[c, a, l], [c, u, l, a, t, o], [r]] ; Hs = [[c, a, l], [c, u, l, a, t, o, r]] ; Hs = [[c, a, l, c, u, l, a], [t, o, r]] ; Hs = [[c, a, l, c, u, l, a, t, o], [r]] ; Hs = [[c, a, l, c, u, l, a, t, o, r]].
In Prolog, it is good practice to retain the generality of your code so that you can readily observe alternative solutions. See logical-purity.
Upvotes: 3