GeorgeOfTheRF
GeorgeOfTheRF

Reputation: 8844

How to train completely new entities instead of pre-trained entities using Spacy NER model?

How do I do transfer learning i.e. take pre-trained Spacy NER model and make it learn new entities specific to my use case?

For this, I have 100 new annotated training samples. The new retrained model should only predict the new entities and not any of the existing entities in the pre-trained spacy model. Just adding/updating new entities to existing models and ignoring the old entities during prediction doesn't make sense.

This official example describes how to add new entities to existing pre-trained entities but that's not what I want. I also have very few examples i.e. 100 to completely built a new NER model from scratch.

Edit: I want to identify all account numbers in an unstructured document.

Example ("I would like to change address corresponding to my account 12345. Kindly let me know how to do it. " [34, 39, 'accountnumber'])

Upvotes: 0

Views: 1453

Answers (2)

Sofie VL
Sofie VL

Reputation: 3096

You mention that you only want to predict the new entities, and not the old ones. There is thus no reason to start from a pre-trained NER model. The features learnt for the other entity types (that you don't want) won't be used/transfered to your new entity type anyway. So you'll just have to start training a model from scratch.

You mention that you only have a few training examples (100), so (as you mention) it will be a challenge to achieve high enough accuracy. Perhaps you could consider running a rule-based matching step first, and then manually consolidate the hits from that matching step to augment your training data more quickly.

Upvotes: 2

Adnan S
Adnan S

Reputation: 1882

For your use case, you are adding a new entity type so there should not be confusion with existing entity types. If you call your new entity "accountnumber", you should be able to use the training script you linked to train a model.

For the extraction phase, use the code in the documentation but just filter for the "accountnumber" in the results (i.e. ent.label_ field) and ignore the other existing entities.

Upvotes: 1

Related Questions