nizam uddin
nizam uddin

Reputation: 341

How to get JJ and NN (adjective and Noun) from the triples generated StanfordDependencyParser with NLTK?

i got triples using the following code, but i want to get nouns and adjective from tripples, i tried alot but failed, new to NLTK and python, any help ?

from nltk.parse.stanford import StanfordDependencyParser
dp_prsr= StanfordDependencyParser('C:\Python34\stanford-parser-full-2015-04-20\stanford-parser.jar','C:\Python34\stanford-parser-full-2015-04-20\stanford-parser-3.5.2-models.jar', model_path='edu/stanford/nlp/models/lexparser/englishPCFG.ser.gz')
word=[]
s='bit is good university'
sentence = dp_prsr.raw_parse(s)
for line in sentence:
    print(list(line.triples()))

[(('university', 'NN'), 'nsubj', ('bit', 'NN')), (('university', 'NN'), 'cop', ('is', 'VBZ')), (('university', 'NN'), 'amod', ('good', 'JJ'))]

i want to get university and good and BIT and universityi tried the following but couldnt work.

   for line in sentence:
    if (list(line.triples)).__contains__()  == 'JJ':
       word.append(list(line.triples()))
   print(word)

but i get empty array... please any help.

Upvotes: 1

Views: 1763

Answers (1)

alvas
alvas

Reputation: 122148

Linguistically

What you're looking out for when you look for triplets that contains a JJ and an NN is usually a Noun phrase NP in a context-free grammar.

In dependency grammar, what you're looking for is a triplet that contains the the JJ and NN POS tags in the arguments. Most specifically, when you're for a constituent / branch that contains an adjectival modified Noun. From the StanfordDepdencyParser output, you need to look for the predicate amod. (If you're confused with what's explained above it is advisable to read up on Dependency grammar before proceeding, see https://en.wikipedia.org/wiki/Dependency_grammar.

Note that the parser outputs the triplets, (arg1, pred, arg2), where the argument 2 (arg2) depends on argument 1 (arg1) through the predicate (pred) relation; i.e. arg1 governs arg2 (see, https://en.wikipedia.org/wiki/Government_(linguistics))


Pythonically

Now to the code part of the answer. You want to iterate through a list of tuples (i.e. triplets) so the easiest solution is to specifically assign variables to the tuples as you iterate, then check for the conditions you need see Find an element in a list of tuples

>>> x = [(('university', 'NN'), 'nsubj', ('bit', 'NN')), (('university', 'NN'), 'cop', ('is', 'VBZ')), (('university', 'NN'), 'amod', ('good', 'JJ'))]
>>> for arg1, pred, arg2 in x:
...     word1, pos1 = arg1
...     word2, pos2 = arg2
...     if pos1.startswith('NN') and pos2.startswith('JJ') and pred == 'amod':
...             print ((arg1, pred, arg2))
... 
(('university', 'NN'), 'amod', ('good', 'JJ'))

Upvotes: 2

Related Questions