Xueqi Zhao
Xueqi Zhao

Reputation: 25

NLP PROLOG Grammar

We have the formal language

G 1 = { V , T , S , P }, where
V = { S , E }
T = { x , y , z }
P = { S->E , E->xE , E->yE , E->z }

Can we accept the seven sentences { xz , xy , xyz , xyxz , z , xxyz , Xyz } as well-formed formulas?Verify this using Prolog.

here is my code:

s --> e.
e --> [x], e.
e --> [y], e.
e --> [z].

It can only recognize s([z], R). Why?

?- s([z], R).
R = [].

?- s([xz], R).
false.

?- s([x], R).
false.

Upvotes: 1

Views: 200

Answers (2)

tas
tas

Reputation: 8140

Firstly, as already pointed out by @lurker in the comments, always use phrase/2 or phrase/3 when calling DCGs. Secondly, as pointed out by @TomasBy and @WillBeason, your DCG is describing a list containing the atoms x, y and z, separated by commas. So, to test if xz is an actual sentence (I assume that's what s stands for) according to your grammar, you would query:

?- phrase(s,[x,z]).
true ;
false.

Indeed it is. Now let's take a look at the most general query, that is, ask Which sentences are there?:

?- phrase(s,S).
ERROR: Out of local stack

That didn't go too well. The reason for that is the order of the DCG rules: calling s//0 leads to the first rule of e//0 being called, that recursively calls e//0 again, that is to say, the first rule of e//0 and so the loop continues until Prolog runs out of stack. So let's alter the ordering of the rules by putting the non-recursive rule first...

s --> e.

e --> [z].          % <- moved here from last position
e --> [x], e.
e --> [y], e.

... and retry the query:

?- phrase(s,S).
S = [z] ;
S = [x, z] ;
S = [x, x, z] ;
S = [x, x, x, z] ;
.
.
.

Now we're getting actual solutions. So the ordering of the DCG rules does matter. However, the listing of the answers is unfair, since the last rule of e//0, the one dealing with y, is actually never called. That can be remedied by prefixing a goal length/2. The query...

?- length(S,_).
S = [] ;
S = [_G3671] ;
S = [_G3671, _G3674] ;
S = [_G3671, _G3674, _G3677] ;
.
.
.

...yields lists with all possible lengths, so prefixing it to the DCG call, will make Prolog look for all solutions of length 0 before moving on to length 1 before moving on to length 2 and so on...

?- length(S,_), phrase(s,S).
S = [z] ;                    % <- solutions of length 1 from here
S = [x, z] ;                 % <- solutions of length 2 from here
S = [y, z] ;
S = [x, x, z] ;              % <- solutions of length 3 from here
S = [x, y, z] ;
S = [y, x, z] ;
S = [y, y, z] ;
S = [x, x, x, z] ;           % <- solutions of length 4 from here
.
.
.

So your grammar is actually producing sentences of arbitrary length, as it should be. Moving on to your seven sentence example, if your application requires limiting the length of the lists, you can do that by prefixing a goal between/3 to your query...

?- between(1,3,N), length(S,N), phrase(s,S).
N = 1,
S = [z] ;
N = 2,
S = [x, z] ;
N = 2,
S = [y, z] ;
N = 3,
S = [x, x, z] ;
N = 3,
S = [x, y, z] ;
N = 3,
S = [y, x, z] ;
N = 3,
S = [y, y, z] ;
false.

... that will now yield all seven sentences consisting of at most 3 words. Note that your example { xz , xy , xyz , xyxz , z , xxyz , xyz } is not exactly the set of sentences described by your grammar. The element xy is not a sentence at all according to the grammar rules. The sentences xyxz and xxyz are specified by your grammar but require a maximum length of at least four words, which will yield sixteen answers. And the last of the seven sentences, xyz, appears twice in your example, unless you meant the query...

?- phrase(s,[X,y,z]).
X = x ;
X = y ;
false.

... that yields to two sentences, the first of which still constitutes a duplicate.

Finally, if you really desperately need to get atoms as an answer, you can alter the DCG to put the codes corresponding to x, y and z into the list instead of the actual atoms. Then you can use atom_codes/2 to get the sentences as a single atom instead of a list of words:

s --> e.

e --> [0'z].      % 0'z denotes the code corresponding to z
e --> [0'x], e.   % 0'x denotes the code corresponding to x
e --> [0'y], e.   % 0'y denotes the code corresponding to y

?- between(1,3,N), length(S,N), phrase(s,S), atom_codes(A,S).
N = 1,
S = [122],
A = z ;
N = 2,
S = [120, 122],
A = xz ;
N = 2,
S = [121, 122],
A = yz ;
N = 3,
S = [120, 120, 122],
A = xxz ;
N = 3,
S = [120, 121, 122],
A = xyz ;
N = 3,
S = [121, 120, 122],
A = yxz ;
N = 3,
S = [121, 121, 122],
A = yyz ;
false.

Upvotes: 5

Will Beason
Will Beason

Reputation: 3551

(From the comment above)

Each element in the sentence needs to be comma-separated. Strings are treated atomically, not as sequences of characters.

?- s([x,z], R).
R = [] .

?- s([x,y,x,z], R).
R = [] .

?- s([x,y], R).
false.

Upvotes: 0

Related Questions