BriWill
BriWill

Reputation: 148

KeyError in spaCy GoldParse

The following code fragment (modified from spaCy sample code) generates a KeyError that I just can't figure out:

import en_core_web_sm
from spacy.gold import GoldParse

nlp = en_core_web_sm.load()
nlp.entity.add_label('ACCT')

TRAIN_DATA = [

    ("Exxon opened a new processing facility", {
        "entities": [(0, 5, "ACCT")]
    }),

    ("another example sentence", {
        "entities": []
    }),

    ("Shell is an oil company, and so is Chevron.", {
        "entities": [(0, 5, "ACCT"), (35, 42, "ACCT")]
    }),

    ("Texaco?", {
        "entities": [(0, 6, "ACCT")]
    })
]

# Add new words to vocab
for raw_text, _ in TRAIN_DATA:
    doc = nlp.make_doc(raw_text)
    for word in doc:
        _ = nlp.vocab[word.orth]

loss = 0.
for raw_text, entity_offsets in TRAIN_DATA:
    doc = nlp.make_doc(raw_text)
    gold = GoldParse(doc, entities=entity_offsets)
    loss += nlp.entity.update(doc, gold, drop=0.9)

The error is:

KeyError                                  Traceback (most recent call last)
<ipython-input-27-bbf3e1dc4d39> in <module>()
     33 for raw_text, entity_offsets in TRAIN_DATA:
     34     doc = nlp.make_doc(raw_text)
---> 35     gold = GoldParse(doc, entities=entity_offsets)
     36     loss += nlp.entity.update(doc, gold, drop=0.9)
     37 

gold.pyx in spacy.gold.GoldParse.__init__()

KeyError: 0

I'm seeing this error with spaCy 2.0.3 as well as spaCy 1.9.

When I run similar code in a Flask app, I get additional trace information that suggests the actual line that is failing is elif not isinstance(entities[0], basestring): in the gold.pyx file.

Can anyone help explain what's happening?

Upvotes: 2

Views: 1846

Answers (1)

BriWill
BriWill

Reputation: 148

I don't know how the spaCy sample code ever worked, but the GoldParse method wants entities to be a list, not a dict. Changing the line to:

gold = GoldParse(doc, entities=entity_offsets.get('entities'))

fixed the problem.

Upvotes: 2

Related Questions