Reputation: 2837
I'm using nltk.tree.Tree
in order to read a constituency-based parse tree. I need to find the path of nodes I need to move through to get from one specific word in the tree, to another.
A quick example:
This is the parse tree of the sentece "saw the dog":
(VP (VERB saw) (NP (DET the) (NOUN dog)))
If I want the path between the word the
and dog
, it would be: DET, NP, NOUN
.
I'm not even sure how to start: how do I find the values of the leaves of the tree? How could I find a leave's/node's parent?
Thanks.
Upvotes: 2
Views: 1525
Reputation: 4749
Here is the code:
def get_lca_length(location1, location2):
i = 0
while i < len(location1) and i < len(location2) and location1[i] == location2[i]:
i+=1
return i
def get_labels_from_lca(ptree, lca_len, location):
labels = []
for i in range(lca_len, len(location)):
labels.append(ptree[location[:i]].label())
return labels
def findPath(ptree, text1, text2):
leaf_values = ptree.leaves()
leaf_index1 = leaf_values.index(text1)
leaf_index2 = leaf_values.index(text2)
location1 = ptree.leaf_treeposition(leaf_index1)
location2 = ptree.leaf_treeposition(leaf_index2)
#find length of least common ancestor (lca)
lca_len = get_lca_length(location1, location2)
#find path from the node1 to lca
labels1 = get_labels_from_lca(ptree, lca_len, location1)
#ignore the first element, because it will be counted in the second part of the path
result = labels1[1:]
#inverse, because we want to go from the node to least common ancestor
result = result[::-1]
#add path from lca to node2
result = result + get_labels_from_lca(ptree, lca_len, location2)
return result
ptree = ParentedTree.fromstring("(VP (VERB saw) (NP (DET the) (NOUN dog)))")
print(ptree.pprint())
print(findPath(ptree, 'the', "dog"))
It is based on list representation of trees, see here. Also check similar questions.
Upvotes: 1