Evan Lalo
Evan Lalo

Reputation: 1273

Understanding Spacy's Scorer Output

I'm evaluating a custom NER model that I built using Spacy. I'm evaluating the training sets using Spacy's Scorer class.

    def Eval(examples):
    # test the saved model
    print("Loading from", './model6/')
    ner_model = spacy.load('./model6/')

    scorer = Scorer()
    try:
        for input_, annot in examples:
            doc_gold_text = ner_model.make_doc(input_)
            gold = GoldParse(doc_gold_text, entities=annot['entities'])
            pred_value = ner_model(input_)
            scorer.score(pred_value, gold)
    except Exception as e: print(e)

    print(scorer.scores)

It works fine but I don't understand the output. Here's what I get for each training set.

{'uas': 0.0, 'las': 0.0, 'ents_p': 90.14084507042254, 'ents_r': 92.7536231884058, 'ents_f': 91.42857142857143, 'tags_acc': 0.0, 'token_acc': 100.0}

{'uas': 0.0, 'las': 0.0, 'ents_p': 91.12227805695142, 'ents_r': 93.47079037800687, 'ents_f': 92.28159457167091, 'tags_acc': 0.0, 'token_acc': 100.0}

{'uas': 0.0, 'las': 0.0, 'ents_p': 92.45614035087719, 'ents_r': 92.9453262786596, 'ents_f': 92.70008795074759, 'tags_acc': 0.0, 'token_acc': 100.0}

{'uas': 0.0, 'las': 0.0, 'ents_p': 94.5993031358885, 'ents_r': 94.93006993006993, 'ents_f': 94.76439790575917, 'tags_acc': 0.0, 'token_acc': 100.0}

{'uas': 0.0, 'las': 0.0, 'ents_p': 92.07920792079209, 'ents_r': 93.15525876460768, 'ents_f': 92.61410788381743, 'tags_acc': 0.0, 'token_acc': 100.0}

Does anyone know what the keys are? I've looked over Spacy's documentation and could not find anything.

Thanks!

Upvotes: 18

Views: 10441

Answers (1)

mcoav
mcoav

Reputation: 1616

  • UAS (Unlabelled Attachment Score) and LAS (Labelled Attachment Score) are standard metrics to evaluate dependency parsing. UAS is the proportion of tokens whose head has been correctly assigned, LAS is the proportion of tokens whose head has been correctly assigned with the right dependency label (subject, object, etc).
  • ents_p, ents_r, ents_f are the precision, recall and fscore for the NER task.
  • tags_acc is the POS tagging accuracy.
  • token_acc seems to be the precision for token segmentation.

Upvotes: 24

Related Questions