Reputation: 10383
I'm working on a nlp project and I want to filter out words depending on its position in the dependency tree.
To plot the tree I'm using the code from this post:
def to_nltk_tree(node):
if node.n_lefts + node.n_rights > 0:
return Tree(node.orth_, [to_nltk_tree(child) for child in node.children])
else:
return node.orth_
For a sample sentence:
"A group of people around the world are suddenly linked mentally"
I got this tree:
From this tree what I want to get is a list of tuples with the word and its corresponding depth in the tree:
[(linked,1),(are,2),(suddenly,2),(mentally,2),(group,2),(A,3),(of,3),(people,4)....]
For this case, I'm not interested in words which does not have childs: [are,suddenly,mentally,A,the] So what I have been able to do so far is to get only the list of words which have children, to do it so I'm using this code:
def get_words(root,words):
children = list(root.children)
for child in children:
if list(child.children):
words.append(child)
get_words(child,words)
return list(set(words)
[to_nltk_tree(sent.root).pretty_print() for sent in doc.sents]
s_root = list(doc.sents)[0].root
words = []
words.append(s_root)
words = get_words(s_root,words)
words
[around, linked, world, of, people, group]
From this how can I get the desired tuples with the words and its respective depth?
Upvotes: 1
Views: 2259
Reputation: 50200
Are you sure that's an nltk Tree
in your code? The nltk's Tree
class does not have a children
attribute. With an nltk Tree
, you can do what you want by using "treepositions", which are paths down the tree. Each path is a tuple of branch choices. The treeposition of "people" is (0, 2, 1, 0)
, and as you can see the depth of a node is just the length of its treeposition.
First I get the paths of the leaves so I can exclude them:
t = nltk.Tree.fromstring("""(linked (are suddenly mentally
(group A (of (people (around (world the)))))))""")
n_leaves = len(t.leaves())
leavepos = set(t.leaf_treeposition(n) for n in range(n_leaves))
Now it's easy to list the non-terminal nodes and their depth:
>>> for pos in t.treepositions():
if pos not in leavepos:
print(t[pos].label(), len(pos))
linked 0
are 1
group 2
of 3
people 4
around 5
world 6
Incidentally, nltk trees have their own display methods. Try print(t)
or t.draw()
, which draws the tree in a pop-up window.
Upvotes: 1