aman
aman

Reputation: 1995

Python NLTK : Extract lexical head item from Stanford dependency parsed result

I have a sentence and i want to extract lexical head item, i could do the dependency parsing using Stanford NLP library.

How can i extract main head head in a sentence?

In the case of the sentence Download and share this tool, the head would be Download.

I've tried the following:

 def get_head_word(text):
     standepparse=StanfordDependencyParser(path_to_jar='/home/stanford_resource/stanford-parser-full-2014-06-16/stanford-parser.jar',path_to_models_jar='/home/stanford_resource/stanford-parser-full-2014-06-16/stanford-parser-3.4-models.jar',model_path='/home/stanford_resource/stanford-parser-full-2014-06-16/stanford-parser-3.4-models/edu/stanford/nlp/models/lexparser/englishPCFG.ser.gz')
     parsetree=standepparse.raw_parse(text)
     p_tree=list(parsetree)[0]
     print p_tree.to_dot()

 text = 'Download and share this tool'
 get_head_word(text)


output:

digraph G{
edge [dir=forward]
node [shape=plaintext]

0 [label="0 (None)"]
0 -> 1 [label="root"]
1 [label="1 (Download)"]
1 -> 2 [label="cc"]
1 -> 3 [label="conj"]
1 -> 5 [label="dobj"]
2 [label="2 (and)"]
3 [label="3 (share)"]
4 [label="4 (this)"]
5 [label="5 (software)"]
5 -> 4 [label="det"]
}

Upvotes: 2

Views: 900

Answers (1)

alvas
alvas

Reputation: 122142

To find the dependency head of sentence, simply look for nodes that whose head values points to the root node. In NLTK API to DependencyGraph, you can easily look for the node that its head points to the 1st index of the dictionary.

Do note that in dependency parsing unlike typical chomsky normal form / CFG parse trees there might be more than one head to the dependency parse.

But since you're casting the dependency output into a Tree structure, you can do the following:

tree_head = next(n for n in p_tree.node_values() if n['head'] == 1)

But do note that linguistically, the head in the sentenceDownload and share this tool should be Download and share. But computationally a tree is hierarchical and a normal-form tree would have ROOT->Download->and->share but some parsers might produce this tree too: ROOT->and->Download;share

Upvotes: 1

Related Questions