Reputation: 1782
I am trying to add custom NER labels using spacy 3. I found tutorials for older versions and made adjustments for spacy 3. Here is the whole code I am using:
import random
import spacy
from spacy.training import Example
LABEL = 'ANIMAL'
TRAIN_DATA = [
("Horses are too tall and they pretend to care about your feelings", {'entities': [(0, 6, LABEL)]}),
("Do they bite?", {'entities': []}),
("horses are too tall and they pretend to care about your feelings", {'entities': [(0, 6, LABEL)]}),
("horses pretend to care about your feelings", {'entities': [(0, 6, LABEL)]}),
("they pretend to care about your feelings, those horses", {'entities': [(48, 54, LABEL)]}),
("horses?", {'entities': [(0, 6, LABEL)]})
]
nlp = spacy.load('en_core_web_sm') # load existing spaCy model
ner = nlp.get_pipe('ner')
ner.add_label(LABEL)
print(ner.move_names) # Here I see, that the new label was added
optimizer = nlp.create_optimizer()
# get names of other pipes to disable them during training
other_pipes = [pipe for pipe in nlp.pipe_names if pipe != "ner"]
with nlp.disable_pipes(*other_pipes): # only train NER
for itn in range(20):
random.shuffle(TRAIN_DATA)
losses = {}
for text, annotations in TRAIN_DATA:
doc = nlp(text)
example = Example.from_dict(doc, annotations)
nlp.update([example], drop=0.35, sgd=optimizer, losses=losses)
print(losses)
# test the trained model # add some dummy sentences with many NERs
test_text = 'Do you like horses?'
doc = nlp(test_text)
print("Entities in '%s'" % test_text)
for ent in doc.ents:
print(ent.label_, " -- ", ent.text)
This code outputs the ValueError exception, but only after 2 iterations - notice the first 2 lines:
{'ner': 9.862242701536594}
{'ner': 8.169456698315201}
Traceback (most recent call last):
File ".\custom_ner_training.py", line 46, in <module>
nlp.update([example], drop=0.35, sgd=optimizer, losses=losses)
File "C:\ogr\moje\python\spacy_pg\myvenv\lib\site-packages\spacy\language.py", line 1106, in update
proc.update(examples, sgd=None, losses=losses, **component_cfg[name])
File "spacy\pipeline\transition_parser.pyx", line 366, in spacy.pipeline.transition_parser.Parser.update
File "spacy\pipeline\transition_parser.pyx", line 478, in spacy.pipeline.transition_parser.Parser.get_batch_loss
File "spacy\pipeline\_parser_internals\ner.pyx", line 310, in spacy.pipeline._parser_internals.ner.BiluoPushDown.set_costs
ValueError
I see the ANIMAL
label was added by calling ner.move_names
.
When I change my the value LABEL = 'PERSON
, the code runs successfully and recognizes horses as PERSON
on the new data. This is why I am assuming, there is no error in the code itself.
Is there something I am missing? What am I doing wrong? Could someone reproduce, please?
NOTE: This is my first question ever here. I hope I provided all information. If not, let me know in the comments.
Upvotes: 1
Views: 1273
Reputation: 391
One more potential reason could be the misaligned label info in the corpus. You can check if there are extra spaces in the training data. If you, so can first remove extra spaces from the text and calculate the start and end positions of the label within the text.
Upvotes: 0
Reputation: 1026
You need to change the following line in the for
loop
doc = nlp(text)
to
doc = nlp.make_doc(text)
The code should work and produce the following results:
{'ner': 9.60289144264557}
{'ner': 8.875474230820478}
{'ner': 6.370401408220459}
{'ner': 6.687456469517201}
...
{'ner': 1.3796682589133492e-05}
{'ner': 1.7709562613218738e-05}
Entities in 'Do you like horses?'
ANIMAL -- horses
Upvotes: 2