Reputation: 844
I created a custom classifier based chunker: DigDug_classifier
, which chunks the following sentence:
sentence = "There is high signal intensity evident within the disc at T1."
To create these chunks:
(S
(NP There/EX)
(VP is/VBZ)
(NP high/JJ signal/JJ intensity/NN evident/NN)
(PP within/IN)
(NP the/DT disc/NN)
(PP at/IN)
(NP T1/NNP)
./.)
I need to create a list of just the NP from the above, like this:
NP = ['There', 'high signal intensity evident', 'the disc', 'T1']
I wrote the following code:
output = []
for subtree in DigDug_classifier.parse(pos_tags):
try:
if subtree.label() == 'NP': output.append(subtree)
except AttributeError:
output.append(subtree)
print(output)
But that gives me this answer instead:
[Tree('NP', [('There', 'EX')]), Tree('NP', [('high', 'JJ'), ('signal', 'JJ'), ('intensity', 'NN'), ('evident', 'NN')]), Tree('NP', [('the', 'DT'), ('disc', 'NN')]), Tree('NP', [('T1', 'NNP')]), ('.', '.')]
What can I do to get the desired answer?
Upvotes: 1
Views: 1523
Reputation: 17
small modification to Alvas' answer above - to just get leaves'string
[" ".join([leaf.split('/')[0] for leaf in subtree.leaves()]) for subtree in parse_tree if type(subtree) == Tree and subtree.label() == "NP"]
may give an error as leaf is a tuple object and does not have a split() attribute. Instead it should be
leaf[0].split('/')[0]
Upvotes: 0
Reputation: 122052
First, see How to Traverse an NLTK Tree object?
Specific to your question of extraction NP:
>>> from nltk import Tree
>>> parse_tree = Tree.fromstring("""(S
... (NP There/EX)
... (VP is/VBZ)
... (NP high/JJ signal/JJ intensity/NN evident/NN)
... (PP within/IN)
... (NP the/DT disc/NN)
... (PP at/IN)
... (NP T1/NNP)
... ./.)""")
# Iterating through the parse tree and
# 1. check that the subtree is a Tree type and
# 2. make sure the subtree label is NP
>>> [subtree for subtree in parse_tree if type(subtree) == Tree and subtree.label() == "NP"]
[Tree('NP', ['There/EX']), Tree('NP', ['high/JJ', 'signal/JJ', 'intensity/NN', 'evident/NN']), Tree('NP', ['the/DT', 'disc/NN']), Tree('NP', ['T1/NNP'])]
# To access the item inside the Tree object,
# use the .leaves() function
>>> [subtree.leaves() for subtree in parse_tree if type(subtree) == Tree and subtree.label() == "NP"]
[['There/EX'], ['high/JJ', 'signal/JJ', 'intensity/NN', 'evident/NN'], ['the/DT', 'disc/NN'], ['T1/NNP']]
# To get the string representation of the leaves
# use " ".join()
>>> [' '.join(subtree.leaves()) for subtree in parse_tree if type(subtree) == Tree and subtree.label() == "NP"]
['There/EX', 'high/JJ signal/JJ intensity/NN evident/NN', 'the/DT disc/NN', 'T1/NNP']
# To just get the leaves' string,
# iterate through the leaves and split the string and
# keep the first part of the "/"
>>> [" ".join([leaf.split('/')[0] for leaf in subtree.leaves()]) for subtree in parse_tree if type(subtree) == Tree and subtree.label() == "NP"]
['There', 'high signal intensity evident', 'the disc', 'T1']
Upvotes: 3