Is Cursive Writing Character segmentation even possible using this strip wise OCR (or other Classification model) based technique?
I need to extract instances of characters from a sample of user's handwritten text. In effect, I am trying to create a database of how a user writes a specific letter from the submitted text sample. This is my attempt at the problem of cursive character segmentation.
My approach at the problem:
- I take small strips of the image, which start from the left end of the image and increase gradually in size till the entire image is covered
- I run an OCR model on each of these strips.
- I should be able to see a spike in the "confidence" of the model whenever a character is completely inside the window.
- Thus, I should have the pixel values of where a new character starts.
The idea seemed theoretically possible.
I found that a similar method had been presented in Stack Exchange before.
I ran my model implementing my idea and found that the result was pretty good, and it should improve substantially improve with a larger improved OCR model, or another model which is actually trained specifically on digit recognition.
I thought that it would work on handwritten characters as well, since TrOCR, unlike Tesseract is also supposed to work for Handwriting.
When I ran it again, I discovered the following:
- Words like Quic (the slice of "Quick") was being read as Quick. This means that the semantic engine of TrOCR won't be easily convinced with words ending midway.
- The overall trend, as shown below, shows an increase in trend of confidence scores in the sentence.
Essentially, since the model is reasonably sure of the fact that the previous words are pretty accurate, the effect of the variations in probability of the later characters reduce. Small local maxima are still visible, though.
The second problem could be solved by segmenting each word, and then running TrOCR. This fails spectacularly. Turns out, TrOCR is terrible in recognising words without context. When I sent a slice of the word "dog", It began returning words like "diores".
I also ran TrOCR on individual characters, where it also fails.
My questions:
- Can working with a CNN or Contrastive learning Based Character Classification model work better? If Yes, Where can I find a dataset of segmented Cursive Characters?
- Is there anyway to continue using TrOCR such that it does not care about word meanings?
- Can this method hope to outperform the recent segmentation techniques such as this?