Calculate a character-wise confusion matrix for OCR evaluation?

Question

I am working on an OCR-task and for evaluation purposes want to calculate a confusion matrix for my model. I want it to basically show how often a character is predicted correctly and how often it is predicted as other characters (and which ones!).

My problem currently is, that a simple pair-wise comparison is difficult due to string-size mismatches and/or additional/missing characters (mainly whitespaces). I was thinking about adding the information about how often a character would need to be inserted/deleted using the Levenshtein distance calculation algorithm, but I'm still unsure on how to handle that.

Are there any state-of-the-art approaches that are commonly used for this? I did some research, but couldn't find anything significant.

Calculate a character-wise confusion matrix for OCR evaluation?

Answers (1)

Related Questions