Reputation: 148
The following code fragment (modified from spaCy sample code) generates a KeyError that I just can't figure out:
import en_core_web_sm
from spacy.gold import GoldParse
nlp = en_core_web_sm.load()
nlp.entity.add_label('ACCT')
TRAIN_DATA = [
("Exxon opened a new processing facility", {
"entities": [(0, 5, "ACCT")]
}),
("another example sentence", {
"entities": []
}),
("Shell is an oil company, and so is Chevron.", {
"entities": [(0, 5, "ACCT"), (35, 42, "ACCT")]
}),
("Texaco?", {
"entities": [(0, 6, "ACCT")]
})
]
# Add new words to vocab
for raw_text, _ in TRAIN_DATA:
doc = nlp.make_doc(raw_text)
for word in doc:
_ = nlp.vocab[word.orth]
loss = 0.
for raw_text, entity_offsets in TRAIN_DATA:
doc = nlp.make_doc(raw_text)
gold = GoldParse(doc, entities=entity_offsets)
loss += nlp.entity.update(doc, gold, drop=0.9)
The error is:
KeyError Traceback (most recent call last)
<ipython-input-27-bbf3e1dc4d39> in <module>()
33 for raw_text, entity_offsets in TRAIN_DATA:
34 doc = nlp.make_doc(raw_text)
---> 35 gold = GoldParse(doc, entities=entity_offsets)
36 loss += nlp.entity.update(doc, gold, drop=0.9)
37
gold.pyx in spacy.gold.GoldParse.__init__()
KeyError: 0
I'm seeing this error with spaCy 2.0.3 as well as spaCy 1.9.
When I run similar code in a Flask app, I get additional trace information that suggests the actual line that is failing is elif not isinstance(entities[0], basestring):
in the gold.pyx
file.
Can anyone help explain what's happening?
Upvotes: 2
Views: 1846
Reputation: 148
I don't know how the spaCy sample code ever worked, but the GoldParse
method wants entities
to be a list
, not a dict
. Changing the line to:
gold = GoldParse(doc, entities=entity_offsets.get('entities'))
fixed the problem.
Upvotes: 2