Reputation: 241
I want to know let's say I have 10 custom entities to recognize how much annotated training sentences should I give (Any rough idea) ??
Thank You, in Advance!! :)
I am new to this, please help
Upvotes: 4
Views: 5407
Reputation: 67
For the custom NER model from Spacy, you will definitely require around 100 samples for each entity that too without any biases in your dataset.
All this is as per my experience.
Suggestion -: Spacy Custom model you can explore, but for production level or some good project, you can't be totally dependent on that only, You have to do some NLP/ Relation Extraction, etc. along with this.
Hope this helps.
Upvotes: 1
Reputation: 57
For developing custom ner model at least 50-100 occurrences of each entity will be required along with their proper context. Otherwise if you have less data than your custom model will overfit on that. So, depending upon your data you will require atleast 200 to 300 sentences.
Upvotes: 3