Spacy code is not correctly identifying annotations with respect to training data

Question

Usecase

I am creating program to annotate a sentence from a paragraph that contains specific words for quick contract analysis but I am noticing that the annotation is not not highlighting correct words. Please see training data below in the code and testing data is below.

I would expect my code to identify as
"MrAsells a product that he warrants for at least one year" when it finds the term "Warranty' "The minimum payment terms acceptable to our firm are Net 90 days" when it sees Payment terms based on trained data. However the output of code is:

However the output is

Marketing below does not align at all with "Payment Terms"

WORK_OF_ART -- MrX

Marketing -- MrAsells a product that he warrants for at least one year and is hopeful that he receives the payment for the product within 70 days.

Marketing -- MrB is not allowed to share any logo that he might use during the project phase with other clients as a promotional item.

#Testing Data that is imported using Doc2x.

"MrB expects MrX to take responsibility for owning client data to highest standard. CompanyA is an affiliate of CompanyB. MrAsells a product that he warrants for at least one year and is hopeful that he receives the payment for the product within 70 days. MrB is not allowed to share any logo that he might use during the project phase with other clients as a promotional item"

#Code

import spacy
import random
from spacy.training import Example
import docx2txt
from spacy import displacy
import pandas as pd
import docx

#nlp = spacy.blank('en')
nlp = spacy.load('en_core_web_sm')
ner=nlp.get_pipe("ner")

if 'ner' not in nlp.pipe_names:
    ner_pipe = nlp.create_pipe('ner')
    nlp.add_pipe(ner_pipe, last=True)
else:
    ner_pipe = nlp.get_pipe('ner')


TRAIN_DATA = [("The minimum payment terms acceptable to our firm are Net 90 days.",{"entities":[(0,62,"Payment Terms")]}),
             ("We do not allow anyone to share our logo for marketing purpose.",{"entities":[(0,63,"Marketing")]}),
             ("We expect that the firm will honor our warranty requirement of atleast one year.",{"entities":[(39,48,"Warranty")]})]

for _,annotations in TRAIN_DATA:
     for entity in annotations['entities']:
        ner.add_label(entity[2])

other_pipes = [pipe for pipe in nlp.pipe_names if pipe != 'ner']



with nlp.disable_pipes(*other_pipes):  # only train NER
        optimizer = nlp.create_optimizer()
        for iteration in range(200):
            random.shuffle(TRAIN_DATA)
            for text, annotations in TRAIN_DATA:
                doc = nlp.make_doc(text)
                example = Example.from_dict(doc, annotations)
                nlp.update([example],drop=0.3)

# test the trained model # add some dummy sentences with many NERs
test_text = docx2txt.process('C:/users/Siddk/Testing.docx')
doc = nlp(test_text)
for ent in doc.ents:
    print(ent.label_, " -- ", ent.text)

Spacy code is not correctly identifying annotations with respect to training data

Answers (1)

Related Questions