Rapid Readers
Rapid Readers

Reputation: 315

Why is the spaCy inconsistent with smaller texts?

I was testing out spaCy on a small sample text. Here's a simple program I wrote to do that:

import spacy

text = 'Delaney Schreiber did a thing!'
nlp = spacy.load('en_core_web_sm')
doc = nlp(text)

for ent in doc.ents:
    print(ent)

Running this code didn't produce any output, so I went ahead and tried tweaking the sentence to get anything out of it. Here's how I made it work:

I'm sure there are other tweaks that work too, but my main question is - why this inconsistency? And why does spaCy seemingly ignore proper nouns at the start of sentences?

Upvotes: 0

Views: 160

Answers (1)

EliasK93
EliasK93

Reputation: 3174

Since the model uses a neural network it is always difficult to tell for individual cases. The more indication/context the model has that something could be a name, the more more likely is that it will mark it as an entity. The smaller the model the more often classification errors happen.

The small model detects ~85% of person entities in the test set correctly. The large one around 86% (although in practice I often feel like the difference in NER detection quality is larger than that). The Transformer-based English model scores over 90%.

small model

https://huggingface.co/spacy/en_core_web_sm?text=Her+name+was+Delaney+Schreiber%2C+and+she+did+a+thing%21

large model

https://huggingface.co/spacy/en_core_web_lg?text=Her+name+was+Delaney+Schreiber%2C+and+she+did+a+thing%21

transformer model

https://huggingface.co/spacy/en_core_web_trf?text=Her+name+was+Delaney+Schreiber%2C+and+she+did+a+thing%21

If you are flexible in which framework you use to extract the entities I personally can recommend flair for NER detection which is really fast for single sentences (but not that fast for large datasets because it does not have a batch processing method like SpaCy has), it has a score of around 94% for NER.

flair

https://huggingface.co/flair/ner-english-large?text=Her+name+was+Delaney+Schreiber%2C+and+she+did+a+thing%21

Upvotes: 2

Related Questions