andreSmol
andreSmol

Reputation: 1038

Dependency parser evaluation with or without punctuation

I want to evaluate a dependency parser taking into consideration punctuation and not taking into consideration punctuation. How should I define the input data if I do not want to take into consideration punctuation? Should I use the same input data (normal sentences with punctuation) as input, the parser defines all the dependencies incuding punctuation. During evaluation I exclude all dependencies related to periods and commas, etc. Or should I remove punctuation in the input sentences? Why is the punctuation often not included (CONLL-X) when evaluating a dependency parser?

Upvotes: 1

Views: 439

Answers (1)

Jon Gauthier
Jon Gauthier

Reputation: 25582

The input data should be defined the same regardless of evaluation details. In standard CoNLL evaluation we simply do not count the arcs leading onto the punctuation tokens. ("Punctuation tokens" in the standard eval are `` '' . , : . (CoreNLP reference))

As to the "why," I don't have a very satisfying answer.. here are a few guesses:

  1. SOTA parsers are not so good at determining punctuation dependencies (true). Numbers drop substantially if we include punctuation. Real improvements in natural language parsing may be obscured by changes in punctuation performance, which is undesirable.
  2. Punctuation dependencies are a bit hard to defend, I think* — the ones extant in the current datasets are just a convention, but other analyses of punctuation might also be licensed. (Compare this to an e.g. amod dependency, which can't really be disputed given that we agree on an annotation scheme.)

`* I'm not an expert on dependency grammars, so please don't take me too seriously :)

Upvotes: 3

Related Questions