Reputation: 1625
I have a sentence John saw a flashy hat at the store
How to represent this as a dependency tree as shown below?
(S
(NP (NNP John))
(VP
(VBD saw)
(NP (DT a) (JJ flashy) (NN hat))
(PP (IN at) (NP (DT the) (NN store)))))
I got this script from here
import spacy
from nltk import Tree
en_nlp = spacy.load('en')
doc = en_nlp("John saw a flashy hat at the store")
def to_nltk_tree(node):
if node.n_lefts + node.n_rights > 0:
return Tree(node.orth_, [to_nltk_tree(child) for child in node.children])
else:
return node.orth_
[to_nltk_tree(sent.root).pretty_print() for sent in doc.sents]
I am getting the following but I am looking for a tree(NLTK) format.
saw
____|_______________
| | at
| | |
| hat store
| ___|____ |
John a flashy the
Upvotes: 12
Views: 7444
Reputation: 1078
Text representations aside, what you're trying to achieve is to get a constituency tree out of a dependency graph. Your example of desired output is a classic constituency tree (as in phrase structure grammar, as opposed to dependency grammar).
While the conversion from constituency trees into dependency graphs is more-or-less an automated task (for instance, https://www.researchgate.net/publication/324940566_Guidelines_for_the_CLEAR_Style_Constituent_to_Dependency_Conversion), the other direction is not. There have been works on that, check out the PAD project https://github.com/ikekonglp/PAD and the paper describing the underlying algorithm: http://homes.cs.washington.edu/~nasmith/papers/kong+rush+smith.naacl15.pdf.
You may also want to reconsider if you really need a constituency parse, here is a good argument: https://linguistics.stackexchange.com/questions/7280/why-is-constituency-needed-since-dependency-gets-the-job-done-more-easily-and-e
Upvotes: 4
Reputation: 477
To re-create an NLTK-style tree for SpaCy dependency parses, try using the draw
method from nltk.tree
instead of pretty_print
:
import spacy
from nltk.tree import Tree
spacy_nlp = spacy.load("en")
def nltk_spacy_tree(sent):
"""
Visualize the SpaCy dependency tree with nltk.tree
"""
doc = spacy_nlp(sent)
def token_format(token):
return "_".join([token.orth_, token.tag_, token.dep_])
def to_nltk_tree(node):
if node.n_lefts + node.n_rights > 0:
return Tree(token_format(node),
[to_nltk_tree(child)
for child in node.children]
)
else:
return token_format(node)
tree = [to_nltk_tree(sent.root) for sent in doc.sents]
# The first item in the list is the full tree
tree[0].draw()
Note that because SpaCy only currently supports dependency parsing and tagging at the word and noun-phrase level, SpaCy trees won't be as deeply structured as the ones you'd get from, for instance, the Stanford parser, which you can also visualize as a tree:
from nltk.tree import Tree
from nltk.parse.stanford import StanfordParser
# Note: Download Stanford jar dependencies first
# See https://stackoverflow.com/questions/13883277/stanford-parser-and-nltk
stanford_parser = StanfordParser(
model_path="edu/stanford/nlp/models/lexparser/englishPCFG.ser.gz"
)
def nltk_stanford_tree(sent):
"""
Visualize the Stanford dependency tree with nltk.tree
"""
parse = stanford_parser.raw_parse(sent)
tree = list(parse)
# The first item in the list is the full tree
tree[0].draw()
Now if we run both, nltk_spacy_tree("John saw a flashy hat at the store.")
will produce this image and nltk_stanford_tree("John saw a flashy hat at the store.")
will produce this one.
Upvotes: 11