CSStudent
CSStudent

Reputation: 329

Stanford Parser: not returning collapsed dependencies

I am facing this particular problem:

Expected is "who" should be replaced by "golfer"

The golfer who scored a 61 won the tournament.

Typed Collapsed Dependencies returned by online Stanford parser:

det(golfer-2, The-1)
nsubj(scored-4, golfer-2)
nsubj(won-7, golfer-2)
rcmod(golfer-2, scored-4)
det(61-6, a-5)
dobj(scored-4, 61-6)
root(ROOT-0, won-7)
det(tournament-9, the-8)
dobj(won-7, tournament-9)

Dependencies returned by the downloaded software:

root(ROOT-0, won-7)
det(golfer-2, The-1)
nsubj(won-7, golfer-2)
nsubj(scored-4, who-3)
rcmod(golfer-2, scored-4)
det(61-6, a-5)
dobj(scored-4, 61-6)
det(tournament-9, the-8)
dobj(won-7, tournament-9)

The configuration used:

Properties props = new Properties();
props.setProperty("annotators", "tokenize, ssplit, pos, lemma, ner,  parse");
......
SemanticGraph dependencies = sentence.get(CollapsedCCProcessedDependenciesAnnotation.class);
System.out.println(dependencies.toList());

Thanks in advance.

EDIT

Created the semantic graph from Grammatical Structure to fix it.


    Tree tree = sentence.get(TreeAnnotation.class);
    GrammaticalStructure gs = gsf.newGrammaticalStructure(tree);
    Collection tdl = gs.typedDependenciesCCprocessed();
    SemanticGraph dependencies = new SemanticGraph(tdl);

Upvotes: 1

Views: 782

Answers (2)

Christopher Manning
Christopher Manning

Reputation: 9450

This is a legitimate regression from past behavior. I don't think there was any reason to take this out, it just somehow broke and no one has noticed. It looks like this happened a while ago. Version 3.2 seems to be the last version that produced nsubj(scored-4, golfer-2) correctly. Feel free to file an issue on Github....

Somehow this is only happening with CoreNLP, and not if you call the parser directly. There must be some difference in the code paths. If you give this command, you get what you want....

stanford-corenlp-full-2015-01-30 manning$ echo "The golfer who scored a 61 won the tournament." | java -cp "*" edu.stanford.nlp.parser.lexparser.LexicalizedParser -outputFormat penn,typedDependencies edu/stanford/nlp/models/lexparser/englishPCFG.ser.gz -
Loading parser from serialized file edu/stanford/nlp/models/lexparser/englishPCFG.ser.gz ... done [0.5 sec].
Parsing file: -
Parsing [sent. 1 len. 10]: The golfer who scored a 61 won the tournament .
(ROOT
  (S
    (NP
      (NP (DT The) (NN golfer))
      (SBAR
        (WHNP (WP who))
        (S
          (VP (VBD scored)
            (NP (DT a) (CD 61))))))
    (VP (VBD won)
      (NP (DT the) (NN tournament)))
    (. .)))

det(golfer-2, The-1)
nsubj(scored-4, golfer-2)
nsubj(won-7, golfer-2)
rcmod(golfer-2, scored-4)
det(61-6, a-5)
dobj(scored-4, 61-6)
root(ROOT-0, won-7)
det(tournament-9, the-8)
dobj(won-7, tournament-9)

Parsed file: - [1 sentences].
Parsed 10 words in 1 sentences (30.86 wds/sec; 3.09 sents/sec).

Upvotes: 1

Jon Gauthier
Jon Gauthier

Reputation: 25582

CoreNLP first generates part-of-speech tags for the sentence with the "pos" annotator. The parsers makes use of these tags as priors during parsing.

This usually explains the discrepancies between the online parser demo and running CoreNLP locally. Can you try disabling the POS tagger annotator and see if the resulting parse changes?

Upvotes: 1

Related Questions