user6943086
user6943086

Reputation:

word processing prolog

I am trying to break a word into different syllables in Prolog according to 2 different rules ..

rule 1: vowel-consonant-vowel (break word after second vowel)
rule 2: vowel-consonant-consonant-vowel (break word between the 2 consonant) , for example, calculator = cal-cula-tor ..

I already have the following code in Prolog, however, it only analyzes the first 3 or 4 letters of the word ..

I need it to process and analyze the entire word.

    vowel(a).
    vowel(e).
    vowel(i).
    vowel(o).
    vowel(u).


    consonant(L):- not(vowel(L)).

    syllable(W, S, RW):- 
        atom_chars(W, [V1, C, V2|Tail]), 
        vowel(V1), 
        consonant(C), 
        vowel(V2), 
        !, 
        atomic_list_concat([V1, C, V2], S), 
        atomic_list_concat(Tail, RW).

    syllable(W, S, RW):- 
        atom_chars(W, [V1, C, C2, V2|Tail]), 
        vowel(V1), 
        consonant(C), 
        consonant(C2),
        vowel(V2), 
        !, 
        atomic_list_concat([V1, C, C2, V2], S), 
        atomic_list_concat(Tail, RW).

    syllable(W, W, _).

    break(W, B):- 
        syllable(W, B, ''), !.

    break(W, B):- 
        syllable(W, S, RW), 
        break(RW, B2), 
        atomic_list_concat([S, '-', B2], B).

Upvotes: 2

Views: 732

Answers (3)

user502187
user502187

Reputation:

I guess its time for a DCG push back solution. The push back is used in the second rule of break//1. It is to reflect that we look at four characters but only consume two characters:

vowel(a). vowel(e). vowel(i). vowel(o). vowel(u).

consonant(C) :- \+ vowel(C).

break([V1,C,V2]) -->
   [V1,C,V2],
   {vowel(V1), consonant(C), vowel(V2)}.
break([V1,C1]), [C2,V2] -->
   [V1,C1,C2,V2],
   {vowel(V1), consonant(C1), consonant(C2), vowel(V2)}.

syllables([L|R]) --> break(L), !, syllables(R).
syllables([[C|L]|R]) --> [C], syllables([L|R]).
syllables([[]]) --> [].

So the overall solution doesn't need some extra predicates such as append/3 or reverse/2. We have also placed a cut to prune the search, which can be done because of the character catchall in the second rule of syllables//1.

Here are some example runs:

Jekejeke Prolog 2, Laufzeitbibliothek 1.1.6
(c) 1985-2016, XLOG Technologies GmbH, Schweiz

?- set_prolog_flag(double_quotes, chars).
Ja

?- phrase(syllables(R), "calculator").
R = [[c,a,l],[c,u,l,a],[t,o,r]] ;
Nein

?- phrase(syllables(R), "kitchensink").
R = [[k,i,t,c,h,e,n],[s,i,n,k]] ;
Nein

P.S.: In some older draft standards this DCG technique was called "right-hand-context", and instead of the verb "push back", the verb "prefixing" was used. In a newer draft standard this is called "semicontext", and instead of the verb "push back", the verb "restoring" is used.

https://www.complang.tuwien.ac.at/ulrich/iso-prolog/dcgs/dcgsdraft-2015-11-10.pdf

Upvotes: 1

coder
coder

Reputation: 12992

I think you could write it more simply.Here is my implementation:

syllable( Input, Final_Word):-
    atom_chars( Input, Char_list),
    (split(Char_list, Word)-> atom_chars( Final_Word, Word);
        Final_Word=Input).


split([],[]).
split([X,Y,Z|T],[X,Y,Z,'-'|T1]):- 
                    vowel(X),vowel(Z),
                    atom_chars( Input, T),
                    syllable(Input,T2),
                    atom_chars( T2, T1). 

split([X,Y,Z,W|T],[X,Y,'-',Z|T1]):-
                    vowel(X),\+vowel(Y),\+vowel(Z),vowel(W),
                    atom_chars( Input, [W|T]),
                    syllable(Input,T2),
                    atom_chars( T2, T1).    


split([X|T],[X|T1]):- \+vowel(X),split(T,T1). 

split/2 splits the word adding '-' where it could be added following the above rules you stated and returns a list to syllable. atom_chars/2 transforms the list to a word. If the word couldn't be split then the output is the input.

Example:

?- syllable(calculator,L).
L = 'calcu-lato-r'.

I'm don't understand why you wrote 'calculator = cal-cula-tor ' since it doesn't follows the rules stated, since "cal" is not vowel-constant-vowel but constant-vowel-constant and same for the rest of thr word...

Upvotes: 1

mat
mat

Reputation: 40768

First, a setting that makes it much more convenient to specify lists of characters, and which I recommend you use in your code if you process text a lot:

:- set_prolog_flag(double_quotes, chars).

Second, the data, represented in such a way that the definitions can be used in all directions:

vowel(a). vowel(e). vowel(i). vowel(o). vowel(u).

consonant(C) :- maplist(dif(C), [a,e,i,o,u]).

For example:

?- consonant(C).
dif(C, u),
dif(C, o),
dif(C, i),
dif(C, e),
dif(C, a).

whereas the version you posted incorrectly says that there is no consonant:

?- consonant(C).
false.

The rules you outline are readily described in Prolog:

% rule 1: vowel-consonant-vowel (break after second vowel)
rule([V1,C,V2|Rest], Bs0, Bs, Rest) :-
        vowel(V1), consonant(C), vowel(V2),
        reverse([V2,C,V1|Bs0], Bs).

% rule 2: vowel-consonant-consonant-vowel (break between the consonants)
rule([V1,C1,C2,V2|Rest], Bs0, Bs, [C2,V2|Rest]) :-
        vowel(V1), consonant(C1), consonant(C2), vowel(V2),
        reverse([C1,V1|Bs0], Bs).

% alternative: no break at this position
rule([L|Ls], Bs0, Bs, Rest) :-
        rule(Ls, [L|Bs0], Bs, Rest).

Exercise: Why am I writing [V2,C,V1|_] instead of [V1,C,V2|...] in the call of reverse/2?

Now, it only remains to describe the list of resulting syllables. This is easy with notation:

word_breaks([]) --> [].
word_breaks([L|Ls]) --> [Bs],
        { rule([L|Ls], [], Bs, Rest) },
        word_breaks(Rest).
word_breaks([L|Ls]) --> [[L|Ls]].

Now the point: Since this program is completely pure and does not incorrectly commit prematurely, we can use it to show that there are also other admissible hyphenations:

?- phrase(word_breaks("calculator"), Hs).
Hs = [[c, a, l], [c, u, l, a], [t, o, r]] ;
Hs = [[c, a, l], [c, u, l, a, t, o], [r]] ;
Hs = [[c, a, l], [c, u, l, a, t, o, r]] ;
Hs = [[c, a, l, c, u, l, a], [t, o, r]] ;
Hs = [[c, a, l, c, u, l, a, t, o], [r]] ;
Hs = [[c, a, l, c, u, l, a, t, o, r]].

In Prolog, it is good practice to retain the generality of your code so that you can readily observe alternative solutions. See .

Upvotes: 3

Related Questions