How to perform OCR on a subset of the camera frame using Google TextRecognizer or Tesseract

Question

Starting with this sample project [ https://github.com/googlesamples/android-vision/tree/master/visionSamples/ocr-reader ], I have been able to implement filtering in the OcrDetectorProcessor.receiveDetections() method.

This works, but com.google.android.gms.vision.text.TextRecognizer appears to search the the entire screen for characters.

I presume that the receiveDetections() method could be called more frequently if a smaller portion of the screen were being scanned for characters instead of the entire screen.

Is it possible to specify a smaller portion of the screen to be scanned? It should be straight-forward to direct the user, through a change to the graphic overly, to position their camera so that this smaller portion of the screen contained the target text, but I'm unsure as to how to tell the processor to use just a small portion of the frame when doing it's OCR processing.

What would need to be altered to specify that the OCR should operate on a subset of the frame?

ADDITIONAL INFORMATION:

I tried to subclass TextRecognizer, but it's marked final, and the source appears to be closed.

So I'm expanding the question to how the functionality of the ocr-reader sample could be replicated using Tesseract.

I found this link, but haven't explored converting the concepts there into camera frames as opposed to a single image file.

Yordan Lyutskanov · Accepted Answer

I had a similar issue and resolved it by using Tesseract and a simple cropping library called "Android Image Cropper" - Link here .

Basically I just crop the image before passing it for processing. Here is a small sample of my code:

This line will start new activity for a result:

 CropImage.activity().setGuidelines(CropImageView.Guidelines.ON).start((Activity) view.getContext());

After that you just need to override onActivityResult. My solution looks like this:

@Override
protected void onActivityResult(int requestCode, int resultCode, @Nullable Intent data) {
    super.onActivityResult(requestCode, resultCode, data);
    if(resultCode == RESULT_OK){
         if(requestCode == CropImage.CROP_IMAGE_ACTIVITY_REQUEST_CODE){
            CropImage.ActivityResult result = CropImage.getActivityResult(data);

            Bitmap bmp = null;
                try {
                    InputStream is = context.getContentResolver().openInputStream(result.getUri());
                    BitmapFactory.Options options = new BitmapFactory.Options();
                    bmp = BitmapFactory.decodeStream(is, null, options);

                } catch (Exception ex) {
                    Log.i(getClass().getSimpleName(), ex.getMessage());
                    Toast.makeText(context, errorConvert, Toast.LENGTH_SHORT).show();
                }

                ivImage.setImageBitmap(bmp);


                doOCR(bmp);
        }
    }
}

As you can see, at the end I am passing the already cropped image for OCR in the doOCR() method. You can just pass it to your OCR function and it should work like a charm.

If you plan to do something similar don't forget to add the dependency:

//Crop library dependency
api 'com.theartofdev.edmodo:android-image-cropper:2.8.+'

And also add the following to your Manifest file:

Hope this helped and good luck :)

How to perform OCR on a subset of the camera frame using Google TextRecognizer or Tesseract

Answers (1)

Related Questions