Reputation: 536
I am using the spaCy CLI train command for NER with train_path
set to the training dataset (train-set) and dev_path
set to the evaluation dataset (test-set). The printout in the console shows me NER Precision, Recall, and the F-score.
However, it is not clear to me how the scores were calculated. Are they the scores from the model predicting on the train-set (train-scores) or from the test-set (test-scores)?
I want to determine after which epoch to stop training to prevent overfitting. Currently after 60 epochs the Loss is still slightly decreasing and Precision, Recall, and F-score are still slightly increasing. It seems to me that the model might be memorizing the training data and that the P, R, and F-scores are calculated on the train-set and thus keep improving.
To my knowledge a good stopping point in training would be right before the test-scores start dropping again, even though the train-scores keep increasing. So I would like to compare them over time (epochs).
My questions are:
dev_path
) used?Upvotes: 1
Views: 1169
Reputation: 3106
The loss
is calculated from the training examples, as a side effect of calling nlp.update()
in the training loop. However, all the other performance metrics are calculated on the dev set, by calling the Scorer
.
To my knowledge a good stopping point in training would be right before the test-scores start dropping again, even though the train-scores keep increasing
Yep, I agree. So looking at the spacy train
results, this would be when the (training) loss is still decreasing, while the (dev) F-score starts decreasing again.
Currently after 60 epochs the Loss is still slightly decreasing and Precision, Recall, and F-score are still slightly increasing.
So it looks like you can train for some epochs more :-)
Upvotes: 1