Reputation: 1995
I have a sentence and i want to extract lexical head item, i could do the dependency parsing using Stanford NLP library.
How can i extract main head head in a sentence?
In the case of the sentence Download and share this tool
, the head would be Download
.
I've tried the following:
def get_head_word(text):
standepparse=StanfordDependencyParser(path_to_jar='/home/stanford_resource/stanford-parser-full-2014-06-16/stanford-parser.jar',path_to_models_jar='/home/stanford_resource/stanford-parser-full-2014-06-16/stanford-parser-3.4-models.jar',model_path='/home/stanford_resource/stanford-parser-full-2014-06-16/stanford-parser-3.4-models/edu/stanford/nlp/models/lexparser/englishPCFG.ser.gz')
parsetree=standepparse.raw_parse(text)
p_tree=list(parsetree)[0]
print p_tree.to_dot()
text = 'Download and share this tool'
get_head_word(text)
output:
digraph G{
edge [dir=forward]
node [shape=plaintext]
0 [label="0 (None)"]
0 -> 1 [label="root"]
1 [label="1 (Download)"]
1 -> 2 [label="cc"]
1 -> 3 [label="conj"]
1 -> 5 [label="dobj"]
2 [label="2 (and)"]
3 [label="3 (share)"]
4 [label="4 (this)"]
5 [label="5 (software)"]
5 -> 4 [label="det"]
}
Upvotes: 2
Views: 900
Reputation: 122142
To find the dependency head of sentence, simply look for nodes that whose head
values points to the root
node. In NLTK
API to DependencyGraph, you can easily look for the node that its head points to the 1st index of the dictionary.
Do note that in dependency parsing unlike typical chomsky normal form / CFG parse trees there might be more than one head to the dependency parse.
But since you're casting the dependency output into a Tree structure, you can do the following:
tree_head = next(n for n in p_tree.node_values() if n['head'] == 1)
But do note that linguistically, the head in the sentenceDownload and share this tool
should be Download
and share
. But computationally a tree is hierarchical and a normal-form tree would have ROOT->Download->and->share
but some parsers might produce this tree too: ROOT->and->Download;share
Upvotes: 1