Stephen
Stephen

Reputation: 1617

When training and testing a Document AI project, what influences the f1score?

Using the cloud console I trained a model using only one field (to avoid the UI bug that was stopping training altogether) on one set of data. The model f1-scored 0.306 on 50 training images and 50 test images.

I added 150 training images, which were predominantly auto-labelled, most fairly correctly in terms of identifying the location but hit and miss on accurate text conversion.

I deployed the model and it scored at 0.17.

  1. I am currently reviewing the auto-trained labels and confirming them or adjusting them (this improved the score to 0.357 so it seems the right step). Is it worthwhile to correct the text translation as well? I understand that the "Human in the Loop" step would potentially provide feedback to the system, but that these fields are not exported back to the OCR?

  2. I intend to also increase the testing set. Is it correct that if I correct the OCR value, it will be used in the testing score? Will it be sent back to the system for updating future translations?

  3. Is the size and shape of the box that is identified part of the f-score in this product? If so, would select text with minor tweaks provide the best match to what the AI already is looking for? Many of my early boxes were by "Add Bounding Box" and were designed to fit the possible space that handwriting is expected (e.g. include the whitespace around the captured text).

Thank you

Upvotes: 0

Views: 433

Answers (1)

Holt Skinner
Holt Skinner

Reputation: 2234

The documentation for Evaluate the performance of processors defines the f1 score as:

  • F1 score: the harmonic mean of precision and recall, which combines precision and recall into a single metric, providing equal weight to both. Defined as 2 * (Precision * Recall) / (Precision + Recall)

As a note for your questions about Human-in-the-Loop, the corrected values from human review are not automatically imported into the training/test datasets, they will need to be imported into the processor's dataset from the HITL output bucket

Upvotes: 1

Related Questions