How are P, R, and F scores calculated in spaCy CLI train NER?

Question

I am using the spaCy CLI train command for NER with train_path set to the training dataset (train-set) and dev_path set to the evaluation dataset (test-set). The printout in the console shows me NER Precision, Recall, and the F-score.

However, it is not clear to me how the scores were calculated. Are they the scores from the model predicting on the train-set (train-scores) or from the test-set (test-scores)?

I want to determine after which epoch to stop training to prevent overfitting. Currently after 60 epochs the Loss is still slightly decreasing and Precision, Recall, and F-score are still slightly increasing. It seems to me that the model might be memorizing the training data and that the P, R, and F-scores are calculated on the train-set and thus keep improving.

To my knowledge a good stopping point in training would be right before the test-scores start dropping again, even though the train-scores keep increasing. So I would like to compare them over time (epochs).

My questions are:

Are the scores displayed in the console while training train-scores or test-scores?
And how to get access to the other one?
If it is the train-score, for what is the testset (dev_path) used?

Sofie VL · Accepted Answer

The loss is calculated from the training examples, as a side effect of calling nlp.update() in the training loop. However, all the other performance metrics are calculated on the dev set, by calling the Scorer.

To my knowledge a good stopping point in training would be right before the test-scores start dropping again, even though the train-scores keep increasing

Yep, I agree. So looking at the spacy train results, this would be when the (training) loss is still decreasing, while the (dev) F-score starts decreasing again.

Currently after 60 epochs the Loss is still slightly decreasing and Precision, Recall, and F-score are still slightly increasing.

So it looks like you can train for some epochs more :-)

How are P, R, and F scores calculated in spaCy CLI train NER?

Answers (1)

Related Questions