Reputation: 447
I'm using spaCy's Named Entity Recognition to figure out the food word in a sentence. This is the code that I have:
import spacy
nlp = spacy.load('en_core_web_sm')
sentence = "I like to eat pizza."
doc = nlp(sentence)
for ent in doc.ents:
print(ent.text, ent.label_)
Why is it not printing "pizza"? According to spaCy's entity types, foods belongs to the PRODUCT
entity type so shouldn't "pizza" be printed for the ent.text
and PRODUCT
be printed for the ent.label
?
Upvotes: 0
Views: 1616
Reputation: 11
I had the same issue and trained spacy with a few examples.
So, grab a few sentences (even 3-4 will start to work), manually extract the products into a list, then you will have a dictionary of texts and lists of products. Then adapt this code
def getSpans(ner_model=None, products=[], nameForNewLabel = 'PRODUCTS', doc=None):
# create patterns
patterns = [ner_model(products) for products in products]
# matches them, what about overlapping?
matcher = PhraseMatcher(ner_model.vocab)
matcher.add(nameForNewLabel, None, *patterns) # add patterns to matcher
matches = matcher(doc)
# now create spans
spans=[]
for match_id, start, end in matches:
# create a new Span for each match and use the match_id (PRODUCTS) as the label
span = doc[start:end] # The matched span
print(span.text, span.start_char,span.end_char, span.label_, "'"+doc.text[span.start_char:span.end_char]+"'", doc.text[span.start_char:span.end_char] in products)
# now create open span
span = Span(doc, start, end, label=match_id)
# add to spans
spans.append(span)
# filter spans for that company,description of company
# Filter a sequence of Span objects and remove duplicates or overlaps. Useful for creating named entities (where one token can only be part of one entity) or
# when merging spans with Retokenizer.merge. When spans overlap, the (first) longest span is preferred over shorter spans.
filtered_spans = filter_spans(spans)
doc.ents = filtered_spans
#create example and add to dataset list of examples to return
eg=Example(doc,doc)
return eg
where
doc = ner_model.make_doc(text)
and
ner_model = spacy.blank('en') # create blank Language class
Then train the model. Once trained for e.g. for 200 epochs with batch_size = max(number examples) you will see it will work.
I cannot share my entire code as I am using it for products in a private equity AI company, but with the above I am sure you can get there.
Upvotes: 0