Reputation: 341
i got triples using the following code, but i want to get nouns and adjective from tripples, i tried alot but failed, new to NLTK and python, any help ?
from nltk.parse.stanford import StanfordDependencyParser
dp_prsr= StanfordDependencyParser('C:\Python34\stanford-parser-full-2015-04-20\stanford-parser.jar','C:\Python34\stanford-parser-full-2015-04-20\stanford-parser-3.5.2-models.jar', model_path='edu/stanford/nlp/models/lexparser/englishPCFG.ser.gz')
word=[]
s='bit is good university'
sentence = dp_prsr.raw_parse(s)
for line in sentence:
print(list(line.triples()))
[(('university', 'NN'), 'nsubj', ('bit', 'NN')), (('university', 'NN'), 'cop', ('is', 'VBZ')), (('university', 'NN'), 'amod', ('good', 'JJ'))]
i want to get university and good and BIT and universityi tried the following but couldnt work.
for line in sentence:
if (list(line.triples)).__contains__() == 'JJ':
word.append(list(line.triples()))
print(word)
but i get empty array... please any help.
Upvotes: 1
Views: 1763
Reputation: 122148
What you're looking out for when you look for triplets that contains a JJ
and an NN
is usually a Noun phrase NP
in a context-free grammar.
In dependency grammar, what you're looking for is a triplet that contains the the JJ and NN POS tags in the arguments. Most specifically, when you're for a constituent / branch that contains an adjectival modified Noun. From the StanfordDepdencyParser
output, you need to look for the predicate amod
. (If you're confused with what's explained above it is advisable to read up on Dependency grammar before proceeding, see https://en.wikipedia.org/wiki/Dependency_grammar.
Note that the parser outputs the triplets, (arg1, pred, arg2)
, where the argument 2 (arg2
) depends on argument 1 (arg1
) through the predicate (pred
) relation; i.e. arg1
governs arg2
(see, https://en.wikipedia.org/wiki/Government_(linguistics))
Now to the code part of the answer. You want to iterate through a list of tuples (i.e. triplets) so the easiest solution is to specifically assign variables to the tuples as you iterate, then check for the conditions you need see Find an element in a list of tuples
>>> x = [(('university', 'NN'), 'nsubj', ('bit', 'NN')), (('university', 'NN'), 'cop', ('is', 'VBZ')), (('university', 'NN'), 'amod', ('good', 'JJ'))]
>>> for arg1, pred, arg2 in x:
... word1, pos1 = arg1
... word2, pos2 = arg2
... if pos1.startswith('NN') and pos2.startswith('JJ') and pred == 'amod':
... print ((arg1, pred, arg2))
...
(('university', 'NN'), 'amod', ('good', 'JJ'))
Upvotes: 2