David Yi
David Yi

Reputation: 401

Using NLTK to parse with context free grammar but issues with Adjectives

The code below is and featured context free grammar for NLTK on Python.

%start S
#Feature based context-free grammar
#Base start is sentence
S[SEM=<?vp(?np)>] -> NP[NUM=?n, SEM=?np] VP[NUM=?n,SEM=?vp] 


#Verb phrase expansion products
VP[NUM=?n,SEM=?v] -> LV[NUM=?n] NP[SEM=?v]
VP[NUM=?n,SEM=<?v(?obj)>] -> TV[NUM=?n,SEM=?v] NP[SEM=?obj]


#Noun phrase expansion products

NP[SEM=<?conj(?np1,?np2)>] -> NP[SEM=?np1] CC[SEM=?conj] NP[SEM=?np2]   
NP[NUM=?n] ->  Ger N[NUM=?n]
NP[NUM=?n, SEM=?np] -> N[NUM=?n, SEM=?np] 

NP[NUM=?n, SEM=<?adj(?np)>] -> ADJ[SEM=?adj] N[NUM=?n, SEM=?np] 

#Following expansion is shorthand for substantive adjective
NP[SEM=?np] -> Adj[SEM=?np]

#Lexical productions
Ger -> 'smoking'
N[NUM=sg, SEM=<\P.P(cocaine)>] -> 'gum' 

N[NUM=sg, SEM=<\P.P(sh$%)>] -> 'bad'


LV[NUM=sg] -> 'is'
LV[NUM=pl] -> 'are'

ADJ[SEM=<\x.pretty(x)>] -> 'pretty'

This code successfully parses the sentences "gum is bad" and "gum is pretty" but what I am trying to get it to do is parse the sentence "gum is pretty bad". It fails to parse this sentence and I can't figure it out why. I have a feeling it is due to

NP[NUM=?n, SEM=<?adj(?np)>] -> ADJ[SEM=?adj] N[NUM=?n, SEM=?np] 

Upvotes: 1

Views: 1127

Answers (1)

Igor
Igor

Reputation: 1281

Not sure what your problem is here. I pasted your grammar into SOgrammar.fcfg Code:

from nltk import load_parser
cp = load_parser('SOgrammar.fcfg')
sentences = ['gum is pretty', 'gum is bad', 'gum is pretty bad']
for sentence in sentences:
    tokens = sentence.split()
    print("tokens:", tokens)
    for tree in cp.parse(tokens):
        print("tree;", tree)

Output:

tokens: ['gum', 'is', 'pretty']
tokens: ['gum', 'is', 'bad']
tree; (S[SEM=<sh$%(cocaine)>]
  (NP[NUM='sg', SEM=<\P.P(cocaine)>]
    (N[NUM='sg', SEM=<\P.P(cocaine)>] gum))
  (VP[NUM='sg', SEM=<\P.P(sh$%)>]
    (LV[NUM='sg'] is)
    (NP[NUM='sg', SEM=<\P.P(sh$%)>]
      (N[NUM='sg', SEM=<\P.P(sh$%)>] bad))))
tokens: ['gum', 'is', 'pretty', 'bad']
tree; (S[SEM=<pretty(\P.P(sh$%),\P.P(cocaine))>]
  (NP[NUM='sg', SEM=<\P.P(cocaine)>]
    (N[NUM='sg', SEM=<\P.P(cocaine)>] gum))
  (VP[NUM='sg', SEM=<pretty(\P.P(sh$%))>]
    (LV[NUM='sg'] is)
    (NP[NUM='sg', SEM=<pretty(\P.P(sh$%))>]
      (ADJ[SEM=<\x.pretty(x)>] pretty)
      (N[NUM='sg', SEM=<\P.P(sh$%)>] bad))))

Which does parse 'gum is pretty bad'. It doesn't parse 'gum is pretty', since pretty is defined as an adjective. According to your grammar, an NP cannot consist of an adjective only.

Additional comment; from a linguistic perspective, 'pretty' isn't really an adjective. And 'bad' isn't really a noun. Depending on what you want to achieve, this may not really matter (if your grammar/the domain is really small), but when you start to write larger grammars, it can be a good idea to stick to word types that have more/better linguistic motivation.

Upvotes: 0

Related Questions