Davit Maghaltadze
Davit Maghaltadze

Reputation: 21

Named Entity Recognition confidence

I need to get confidence about each extracted entity (not to print it but to get it), however, I can't find a method that returns confidences.

Firstly, I have tried using Stanford Named Entity Recognizer library on Java and this solution:

Display Stanford NER confidence score

but it doesn't work (I guess getCliqueTree method is not available). I also have tried using NLTK in Python and Stanford NER model to extract entities, but again couldn't find a way to get confidences.

I know how to do it on Spacy:

https://github.com/explosion/spaCy/issues/831

but as the author says it's inefficient.

So, can you please advise me, how to get the probabilities of each extracted entity?

Upvotes: 2

Views: 1200

Answers (1)

nmq
nmq

Reputation: 3154

Usually NER is a token level classification task.

Confidences are usually derived from each prediction, which is commonly the output of some type of softmax.

The issue then become, how can I get a confidence for a sequence of confidences?

There are multiple ways:

  1. Entropy [Confidence is amount of information]
  2. Average (Mean) [Confidence is the average]
  3. Min/Max of confidences [Confidence is the min/max]

All of these give different answers, none are "better" and it really depends on your use case.

If you would like to order possible entity types, you can start with the following:

  1. Get confidences assuming same label for each token
  2. Get entropy for confidence (probability) sequence
  3. Sort by entropy

Upvotes: -1

Related Questions