Reputation: 343
I was playing with the Flair library in order to see if there is a big difference (in terms of results) between fine-tuning (implemented separately) and embedding projection. The problem that I'm facing involves reading the results (in this case, the experiment was done by using BERT embeddings). In the training.log I get this:
2019-10-10 16:27:02,964 Testing using best model ...
2019-10-10 16:27:02,966 loading file best-model.pt
2019-10-10 16:37:23,793 0.7539 0.7539 0.7539
2019-10-10 16:37:23,795
MICRO_AVG: acc 0.605 - f1-score 0.7539
MACRO_AVG: acc 0.5467 - f1-score 0.6925
0 tp: 1420 - fp: 438 - fn: 144 - tn: 363 - precision: 0.7643 - recall: 0.9079 - accuracy: 0.7093 - f1-score: 0.8299
1 tp: 363 - fp: 144 - fn: 438 - tn: 1420 - precision: 0.7160 - recall: 0.4532 - accuracy: 0.3841 - f1-score: 0.5551
2019-10-10 16:37:23,796
My test dataset contains 2365 instances for a binary text classification task. What do the last 2 lines mean? The 0 and 1 followed by the true positives, precision, recall and so on? What is 0? And what is 1? I also loaded separately the best model and tested on my test dataset and I obtained different results.
Any help would be greatly appreciated.
Upvotes: 0
Views: 147
Reputation: 7369
Since, you are finetuning for binary classification, precision, recall and F1 measure are a way to evaluate the model, and whatever you see are the evaluation on the model.
The 1st character 0 or 1, indicates the class 0 or class 1 (2 classes, as its binary classification). And for each class it mentions the number of true-positives (tp), false-positives(fp), false-negatives(fn) and true-negatives(tn). You can sum them all, it will be equal to the number of examples in your test-set.
A short description of tp,tn,fp,fn:
For class 0 (as positive class):
tp: number of actual examples of class 0, correctly predicted as class 0
fn: number of actual examples of class 1, correctly predicted as class 1
fp: number of actual examples of class 1, wrongly predicted as class 0
tn: number of actual examples of class 0, wrongly predicted as class 1
And its vice-versa for the 2nd line for class 1.
Upvotes: 1