Reputation: 315
I was testing out spaCy on a small sample text. Here's a simple program I wrote to do that:
import spacy
text = 'Delaney Schreiber did a thing!'
nlp = spacy.load('en_core_web_sm')
doc = nlp(text)
for ent in doc.ents:
print(ent)
Running this code didn't produce any output, so I went ahead and tried tweaking the sentence to get anything out of it. Here's how I made it work:
I'm sure there are other tweaks that work too, but my main question is - why this inconsistency? And why does spaCy seemingly ignore proper nouns at the start of sentences?
Upvotes: 0
Views: 160
Reputation: 3174
Since the model uses a neural network it is always difficult to tell for individual cases. The more indication/context the model has that something could be a name, the more more likely is that it will mark it as an entity. The smaller the model the more often classification errors happen.
The small model detects ~85% of person entities in the test set correctly. The large one around 86% (although in practice I often feel like the difference in NER detection quality is larger than that). The Transformer-based English model scores over 90%.
If you are flexible in which framework you use to extract the entities I personally can recommend flair for NER detection which is really fast for single sentences (but not that fast for large datasets because it does not have a batch processing method like SpaCy has), it has a score of around 94% for NER.
Upvotes: 2