Reputation: 401
The code below is and featured context free grammar for NLTK on Python.
%start S
#Feature based context-free grammar
#Base start is sentence
S[SEM=<?vp(?np)>] -> NP[NUM=?n, SEM=?np] VP[NUM=?n,SEM=?vp]
#Verb phrase expansion products
VP[NUM=?n,SEM=?v] -> LV[NUM=?n] NP[SEM=?v]
VP[NUM=?n,SEM=<?v(?obj)>] -> TV[NUM=?n,SEM=?v] NP[SEM=?obj]
#Noun phrase expansion products
NP[SEM=<?conj(?np1,?np2)>] -> NP[SEM=?np1] CC[SEM=?conj] NP[SEM=?np2]
NP[NUM=?n] -> Ger N[NUM=?n]
NP[NUM=?n, SEM=?np] -> N[NUM=?n, SEM=?np]
NP[NUM=?n, SEM=<?adj(?np)>] -> ADJ[SEM=?adj] N[NUM=?n, SEM=?np]
#Following expansion is shorthand for substantive adjective
NP[SEM=?np] -> Adj[SEM=?np]
#Lexical productions
Ger -> 'smoking'
N[NUM=sg, SEM=<\P.P(cocaine)>] -> 'gum'
N[NUM=sg, SEM=<\P.P(sh$%)>] -> 'bad'
LV[NUM=sg] -> 'is'
LV[NUM=pl] -> 'are'
ADJ[SEM=<\x.pretty(x)>] -> 'pretty'
This code successfully parses the sentences "gum is bad" and "gum is pretty" but what I am trying to get it to do is parse the sentence "gum is pretty bad". It fails to parse this sentence and I can't figure it out why. I have a feeling it is due to
NP[NUM=?n, SEM=<?adj(?np)>] -> ADJ[SEM=?adj] N[NUM=?n, SEM=?np]
Upvotes: 1
Views: 1127
Reputation: 1281
Not sure what your problem is here. I pasted your grammar into SOgrammar.fcfg
Code:
from nltk import load_parser
cp = load_parser('SOgrammar.fcfg')
sentences = ['gum is pretty', 'gum is bad', 'gum is pretty bad']
for sentence in sentences:
tokens = sentence.split()
print("tokens:", tokens)
for tree in cp.parse(tokens):
print("tree;", tree)
Output:
tokens: ['gum', 'is', 'pretty']
tokens: ['gum', 'is', 'bad']
tree; (S[SEM=<sh$%(cocaine)>]
(NP[NUM='sg', SEM=<\P.P(cocaine)>]
(N[NUM='sg', SEM=<\P.P(cocaine)>] gum))
(VP[NUM='sg', SEM=<\P.P(sh$%)>]
(LV[NUM='sg'] is)
(NP[NUM='sg', SEM=<\P.P(sh$%)>]
(N[NUM='sg', SEM=<\P.P(sh$%)>] bad))))
tokens: ['gum', 'is', 'pretty', 'bad']
tree; (S[SEM=<pretty(\P.P(sh$%),\P.P(cocaine))>]
(NP[NUM='sg', SEM=<\P.P(cocaine)>]
(N[NUM='sg', SEM=<\P.P(cocaine)>] gum))
(VP[NUM='sg', SEM=<pretty(\P.P(sh$%))>]
(LV[NUM='sg'] is)
(NP[NUM='sg', SEM=<pretty(\P.P(sh$%))>]
(ADJ[SEM=<\x.pretty(x)>] pretty)
(N[NUM='sg', SEM=<\P.P(sh$%)>] bad))))
Which does parse 'gum is pretty bad'. It doesn't parse 'gum is pretty', since pretty is defined as an adjective. According to your grammar, an NP cannot consist of an adjective only.
Additional comment; from a linguistic perspective, 'pretty' isn't really an adjective. And 'bad' isn't really a noun. Depending on what you want to achieve, this may not really matter (if your grammar/the domain is really small), but when you start to write larger grammars, it can be a good idea to stick to word types that have more/better linguistic motivation.
Upvotes: 0