Niranjan Sonachalam
Niranjan Sonachalam

Reputation: 1625

Dependency parsing tree in Spacy

I have a sentence John saw a flashy hat at the store
How to represent this as a dependency tree as shown below?

(S
      (NP (NNP John))
      (VP
        (VBD saw)
        (NP (DT a) (JJ flashy) (NN hat))
        (PP (IN at) (NP (DT the) (NN store)))))

I got this script from here

import spacy
from nltk import Tree
en_nlp = spacy.load('en')

doc = en_nlp("John saw a flashy hat at the store")

def to_nltk_tree(node):
    if node.n_lefts + node.n_rights > 0:
        return Tree(node.orth_, [to_nltk_tree(child) for child in node.children])
    else:
        return node.orth_


[to_nltk_tree(sent.root).pretty_print() for sent in doc.sents]

I am getting the following but I am looking for a tree(NLTK) format.

     saw                 
  ____|_______________    
 |        |           at 
 |        |           |   
 |       hat        store
 |     ___|____       |   
John  a      flashy  the

Upvotes: 12

Views: 7444

Answers (2)

adam.ra
adam.ra

Reputation: 1078

Text representations aside, what you're trying to achieve is to get a constituency tree out of a dependency graph. Your example of desired output is a classic constituency tree (as in phrase structure grammar, as opposed to dependency grammar).

While the conversion from constituency trees into dependency graphs is more-or-less an automated task (for instance, https://www.researchgate.net/publication/324940566_Guidelines_for_the_CLEAR_Style_Constituent_to_Dependency_Conversion), the other direction is not. There have been works on that, check out the PAD project https://github.com/ikekonglp/PAD and the paper describing the underlying algorithm: http://homes.cs.washington.edu/~nasmith/papers/kong+rush+smith.naacl15.pdf.

You may also want to reconsider if you really need a constituency parse, here is a good argument: https://linguistics.stackexchange.com/questions/7280/why-is-constituency-needed-since-dependency-gets-the-job-done-more-easily-and-e

Upvotes: 4

rebeccabilbro
rebeccabilbro

Reputation: 477

To re-create an NLTK-style tree for SpaCy dependency parses, try using the draw method from nltk.tree instead of pretty_print:

import spacy
from nltk.tree import Tree

spacy_nlp = spacy.load("en")

def nltk_spacy_tree(sent):
    """
    Visualize the SpaCy dependency tree with nltk.tree
    """
    doc = spacy_nlp(sent)
    def token_format(token):
        return "_".join([token.orth_, token.tag_, token.dep_])

    def to_nltk_tree(node):
        if node.n_lefts + node.n_rights > 0:
            return Tree(token_format(node),
                       [to_nltk_tree(child) 
                        for child in node.children]
                   )
        else:
            return token_format(node)

    tree = [to_nltk_tree(sent.root) for sent in doc.sents]
    # The first item in the list is the full tree
    tree[0].draw()

Note that because SpaCy only currently supports dependency parsing and tagging at the word and noun-phrase level, SpaCy trees won't be as deeply structured as the ones you'd get from, for instance, the Stanford parser, which you can also visualize as a tree:

from nltk.tree import Tree
from nltk.parse.stanford import StanfordParser

# Note: Download Stanford jar dependencies first
# See https://stackoverflow.com/questions/13883277/stanford-parser-and-nltk
stanford_parser = StanfordParser(
    model_path="edu/stanford/nlp/models/lexparser/englishPCFG.ser.gz"
)

def nltk_stanford_tree(sent):
    """
    Visualize the Stanford dependency tree with nltk.tree
    """
    parse = stanford_parser.raw_parse(sent)
    tree = list(parse)
    # The first item in the list is the full tree
    tree[0].draw()

Now if we run both, nltk_spacy_tree("John saw a flashy hat at the store.") will produce this image and nltk_stanford_tree("John saw a flashy hat at the store.") will produce this one.

Upvotes: 11

Related Questions