Reputation: 1
I have millions of images, and I am able to use OCR with pytesseract to perform descent text extraction, but it takes too long to process all of the images.
Thus I would like to determine if an image simply contains text or not, and if it doesn't, i wouldn't have to perform OCR on it. Ideally this method would have a high recall.
I was thinking about building a SVM or some machine learning model to help detect, but I was hoping if anyone new of a method to quickly determine if an object contains text or not.
Upvotes: 0
Views: 649
Reputation: 55
Unfortunately there is no way to tell if an image has text in it, without performing OCR of some kind on it.
You could build a machine learning model that handles this, however keep in mind it would still need to process the image as well.
Upvotes: 2