Reputation: 131
When using the parser or for the matter any of the Annotation in Core NLP, is there a way to access the probability or the margin of error?
I am particularly interested in detecting ambiguity programmatically. For example, in the sentence below, desire is tagged as a noun, but it could also be a verb.
I want to know if there is a way to retrieve a confidence score from the CoreNLP API that indicates ambiguity.
(NP (NP (NNP Whereas)) (, ,) (NP (NNP users) (NN desire) (S (VP (TO to) (VP (VB sell))))))
In this case, desire is labeled as NN (noun) instead of a verb. I need a way to check how confident CoreNLP is about this classification.
Upvotes: 13
Views: 654
Reputation: 615
Stanford CoreNLP does not provide direct ambiguity scores, but you can detect ambiguity programmatically using these methods:
1. POS Tagging Probability Use the MaxentTagger
in CoreNLP to get log-probabilities for POS tags. Lower probabilities indicate higher ambiguity.
2. Multiple Parses Use n-best parsing
with the Stanford Parser to check if alternative parses exist for the sentence. If "desire" appears with different POS tags in multiple parses, it's ambiguous.
3. Compare Multiple Taggers – Run CoreNLP
alongside Stanz
a, SpaCy
, or Flair
and compare POS tags. Different outputs suggest ambiguity.
4. Dependency Parsing Checks If "desire" is marked as a noun but functions as a verb in dependencies, ambiguity exists.
For real-time ambiguity detection, combining log-probabilities, multiple parses, and external taggers is the best approach.
Upvotes: 0