aws sagemaker for detecting text in an image

Question

I am aware that it is better to use aws Rekognition for this. However, it does not seem to work well when I tried it out with the images I have (which are sort of like small containers with labels on them). The text comes out misspelled and fragmented.

I am new to ML and sagemaker. From what I have seen, the use cases seem to be for prediction and image classification. I could not find one on training a model for detecting text in an image. Is it possible to to do it with Sagemaker? I would appreciate it if someone pointed me in the right direction.

Nick Walsh · Accepted Answer

The different services will all provide different levels of abstraction for Optical Character Recognition (OCR) depending on what parts of the pipeline you are most comfortable with working with, and what you prefer to have abstracted.

Here are a few options:

Rekognition will provide out of the box OCR with the DetectText feature. However, it seems you will need to perform some sort of pre-processing on your images in your current case in order to get better results. This can be done through any method of your choice (Lambda, EC2, etc).
SageMaker is a tool that will enable you to easily train and deploy your own models (of any type). You have two primary options with SageMaker:
1. Do-it-yourself option: If you're looking to go the route of labeling your own data, gathering a sizable training set, and training your own OCR model, this is possible by training and deploying your own model via SageMaker.
2. Existing OCR algorithm: There are many algorithms out there that all have different potential tradeoffs for OCR. One example would be Tesseract. Using this, you can more closely couple your pre-processing step to the text detection.
Amazon Textract (In preview) is a purpose-built dedicated OCR service that may offer better performance depending on what your images look like and the settings you choose.

I would personally recommend looking into pre-processing for OCR to see if it improves Rekognition accuracy before moving onto the other options. Even if it doesn't improve Rekognition's accuracy, it will still be valuable for most of the other options!

aws sagemaker for detecting text in an image

Answers (1)

Related Questions