Reputation: 31
If I want to get phrase tags corresponding each word, how to I get this?
For example :
In this sentence,
My dog also likes eating sausage.
I can get a parse tree in Stanford NLP such as
(ROOT (S (NP (PRP$ My) (NN dog)) (ADVP (RB also)) (VP (VBZ likes) (NP (JJ eating) (NN sausage))) (. .)))
In the above situtation, I want to get phrase tags corresponding each word like
(My - NP), (dog - NP), (also - ADVP), (likes - VP), ...
Is there any method for this simple extraction for phrase tags?
Please help me.
Upvotes: 3
Views: 3972
Reputation: 21
I'm basically continuing the original question. Using Stanza, I would like to obtain a phrase tag per word in a text.
nlp = stanza.Pipeline(lang='en', processors='tokenize,pos,constituency')
doc = nlp('My dog also likes eating sausage.')
tree = doc.sentences[0].constituency
The parse tree obtained is:
(ROOT (S (NP (PRP$ My) (NN dog)) (ADVP (RB also)) (VP (VBZ likes) (S (VP (VBG eating) (NP (NN sausage))))) (. .)))
but would like phrase tags (exactly as in the original question):
(My - NP), (dog - NP), (also - ADVP), (likes - VP), ...
Has this been solved? Thanks!
Upvotes: 1
Reputation: 61
//I guess this is how you get your parse tree.
Tree tree = sentAnno.get(TreeAnnotation.class);
//The children of a Tree annotation is an array of trees.
Tree[] children = parent.children()
//Check the label of any sub tree to see whether it is what you want (a phrase)
for (Tree child: children){
if (child.value().equals("NP")){// set your rule of defining Phrase here
List<Tree> leaves = child.getLeaves(); //leaves correspond to the tokens
for (Tree leaf : leaves){
List<Word> words = leaf.yieldWords();
for (Word word: words)
System.out.print(String.format("(%s - NP),",word.word()));
}
}
}
The code is not fully tested but I think it roughly do what you need. And what's more is I didn't write anything about recursively visit the subtrees but I believe you should be able to do that.
Upvotes: 2