MilapJ95
MilapJ95

Reputation: 1

Tesseract's output is completely wrong and gibberish

I am using Tesseract (tess-two) library in my Android Application for real time text detection. My code :

public void onPreviewFrame(byte[] data, Camera camera) {
   try  {
     Camera.Size previewSize =camera.getParameters().getPreviewSize();
     YuvImage yuvimage=new YuvImage(data, ImageFormat.NV21, previewSize.width, previewSize.height, null);
     ByteArrayOutputStream baos = new ByteArrayOutputStream();
     yuvimage.compressToJpeg(new Rect(0, 0, previewSize.width, previewSize.height), 60, baos);
     byte[] jdata = baos.toByteArray();

     BitmapFactory.Options options = new BitmapFactory.Options();
     options.inSampleSize = 4;
     Bitmap bmp = BitmapFactory.decodeByteArray(jdata, 0, jdata.length);

     TessBaseAPI baseApi = new TessBaseAPI();
     baseAPI.init(DATA_PATH, lang);       
     baseAPI.setImage(bmp);
     extractedText = baseAPI.getUTF8Text();
     DisplayResult.setText(extractedText);
   }
   catch(Exception e) {
     e.printStackTrace();
   }

I have no problem in Tesseract Initialisation as well as setting Image. But the output is completely wrong, take a look at the image. The textview displays the tesseract output(On top of surfaceview).

Tesseract Output

How do I solve this problem?

Upvotes: 0

Views: 719

Answers (1)

TheMetal
TheMetal

Reputation: 11

A few things that might help with the optimization of your output:

  • cropping the image to the desired text area before processing the output
  • exclude punctuation and other characters from the text processing

Upvotes: 1

Related Questions