Interpreting training.log in Flair (Zalando Research)

Question

I was playing with the Flair library in order to see if there is a big difference (in terms of results) between fine-tuning (implemented separately) and embedding projection. The problem that I'm facing involves reading the results (in this case, the experiment was done by using BERT embeddings). In the training.log I get this:

2019-10-10 16:27:02,964 Testing using best model ...
2019-10-10 16:27:02,966 loading file best-model.pt

2019-10-10 16:37:23,793 0.7539  0.7539  0.7539

2019-10-10 16:37:23,795

MICRO_AVG: acc 0.605 - f1-score 0.7539
MACRO_AVG: acc 0.5467 - f1-score 0.6925

0 tp: 1420 - fp: 438 - fn: 144 - tn: 363 - precision: 0.7643 - recall: 0.9079 - accuracy: 0.7093 - f1-score: 0.8299
1 tp: 363 - fp: 144 - fn: 438 - tn: 1420 - precision: 0.7160 - recall: 0.4532 - accuracy: 0.3841 - f1-score: 0.5551

2019-10-10 16:37:23,796

My test dataset contains 2365 instances for a binary text classification task. What do the last 2 lines mean? The 0 and 1 followed by the true positives, precision, recall and so on? What is 0? And what is 1? I also loaded separately the best model and tested on my test dataset and I obtained different results.

Any help would be greatly appreciated.

Ashwin Geet D&#39;Sa · Accepted Answer

Since, you are finetuning for binary classification, precision, recall and F1 measure are a way to evaluate the model, and whatever you see are the evaluation on the model.

The 1st character 0 or 1, indicates the class 0 or class 1 (2 classes, as its binary classification). And for each class it mentions the number of true-positives (tp), false-positives(fp), false-negatives(fn) and true-negatives(tn). You can sum them all, it will be equal to the number of examples in your test-set.

A short description of tp,tn,fp,fn:

For class 0 (as positive class):

tp: number of actual examples of class 0, correctly predicted as class 0

fn: number of actual examples of class 1, correctly predicted as class 1

fp: number of actual examples of class 1, wrongly predicted as class 0

tn: number of actual examples of class 0, wrongly predicted as class 1

And its vice-versa for the 2nd line for class 1.

Interpreting training.log in Flair (Zalando Research)

Answers (1)

Related Questions