Reputation: 4160
I'm converting my OCR application from c++ to java. Using Tess4J I'd like to get the bounding boxes for every word. However, apparently TessResultIterator doesn't provide any methods. So I'd like to if it's possible to somehow get this data?
This is my current code:
TessBaseAPI api = TessAPI1.TessBaseAPICreate();
TessAPI1.TessBaseAPIInit3(api, path, lang);
TessAPI1.TessBaseAPISetPageSegMode(api, TessAPI1.TessPageSegMode.PSM_AUTO);
TessAPI1.TessBaseAPISetImage(api, img, w, h, bpp, bpp*w);
TessAPI1.TessBaseAPIGetUTF8Text(api);
TessResultIterator it = TessAPI1.TessBaseAPIGetIterator(api);
In c++ I could then go on like this:
char* text = it->GetUTF8Text(tesseract::RIL_WORD);
int left, top, right, bttm;
it->BoundingBox(tesseract::RIL_WORD, &left, &top, &right, &bttm);
Upvotes: 2
Views: 7471
Reputation: 41
The following method is taken from the GitHub page for Tess4J, and it shows how to iterate the bounding boxes for each matched word in an input document. It should be easy enough to adapt this code to your own needs, e.g. by using your own path to the Tesseract data directory, as well as path to your own image file.
public void testResultIterator() throws Exception {
File tiff = new File(this.testResourcesDataPath, "eurotext.tif");
BufferedImage image = ImageIO.read(new FileInputStream(tiff));
ByteBuffer buf = ImageIOHelper.convertImageData(image);
int bpp = image.getColorModel().getPixelSize();
int bytespp = bpp / 8;
int bytespl = (int) Math.ceil(image.getWidth() * bpp / 8.0);
TessAPI1.TessBaseAPIInit3(handle, datapath, language);
TessAPI1.TessBaseAPISetPageSegMode(handle, TessPageSegMode.PSM_AUTO);
TessAPI1.TessBaseAPISetImage(handle, buf, image.getWidth(), image.getHeight(), bytespp, bytespl);
ETEXT_DESC monitor = new ETEXT_DESC();
ITessAPI.TimeVal timeout = new ITessAPI.TimeVal();
timeout.tv_sec = new NativeLong(0L); // time > 0 causes blank ouput
monitor.end_time = timeout;
TessAPI1.TessBaseAPIRecognize(handle, monitor);
TessResultIterator ri = TessAPI1.TessBaseAPIGetIterator(handle);
TessPageIterator pi = TessAPI1.TessResultIteratorGetPageIterator(ri);
TessAPI1.TessPageIteratorBegin(pi);
int level = TessPageIteratorLevel.RIL_WORD;
do {
Pointer ptr = TessAPI1.TessResultIteratorGetUTF8Text(ri, level);
String word = ptr.getString(0);
TessAPI1.TessDeleteText(ptr);
float confidence = TessAPI1.TessResultIteratorConfidence(ri, level);
IntBuffer leftB = IntBuffer.allocate(1);
IntBuffer topB = IntBuffer.allocate(1);
IntBuffer rightB = IntBuffer.allocate(1);
IntBuffer bottomB = IntBuffer.allocate(1);
TessAPI1.TessPageIteratorBoundingBox(pi, level, leftB, topB, rightB, bottomB);
int left = leftB.get();
int top = topB.get();
int right = rightB.get();
int bottom = bottomB.get();
System.out.print(String.format("%s %d %d %d %d %f", word, left, top, right, bottom, confidence));
IntBuffer boldB = IntBuffer.allocate(1);
IntBuffer italicB = IntBuffer.allocate(1);
IntBuffer underlinedB = IntBuffer.allocate(1);
IntBuffer monospaceB = IntBuffer.allocate(1);
IntBuffer serifB = IntBuffer.allocate(1);
IntBuffer smallcapsB = IntBuffer.allocate(1);
IntBuffer pointSizeB = IntBuffer.allocate(1);
IntBuffer fontIdB = IntBuffer.allocate(1);
String fontName = TessAPI1.TessResultIteratorWordFontAttributes(ri, boldB, italicB, underlinedB,
monospaceB, serifB, smallcapsB, pointSizeB, fontIdB);
boolean bold = boldB.get() == TRUE;
boolean italic = italicB.get() == TRUE;
boolean underlined = underlinedB.get() == TRUE;
boolean monospace = monospaceB.get() == TRUE;
boolean serif = serifB.get() == TRUE;
boolean smallcaps = smallcapsB.get() == TRUE;
int pointSize = pointSizeB.get();
int fontId = fontIdB.get();
System.out.println(String.format(" font: %s, size: %d, font id: %d, bold: %b,"
+ " italic: %b, underlined: %b, monospace: %b, serif: %b, smallcap: %b", fontName, pointSize,
fontId, bold, italic, underlined, monospace, serif, smallcaps));
} while (TessAPI1.TessPageIteratorNext(pi, level) == TRUE);
}
The output from the above script will be some number of lines, one line for each matched word, looking something like this:
SOME_WORD 65 60 120 83 96.072098 font: null, size: 32, font id: -1, bold: false, italic: false, underlined: false, monospace: false, serif: false, smallcap: false
The first four numbers, after the matched word SOME_WORD
are the left, top, right, and bottom coordinates. Following this is the confidence, given as a percentage. Then, there is some metadata about the text itself, including font style information.
Upvotes: 0
Reputation: 8365
Can you try the following code snippet? I did not really test it thoroughly.
TessResultIterator ri = TessAPI1.TessBaseAPIGetIterator(api);
TessPageIterator pi = TessAPI1.TessResultIteratorGetPageIterator(ri);
String str = TessAPI1.TessResultIteratorGetUTF8Text(ri, TessPageIteratorLevel.RIL_WORD);
IntBuffer leftB = IntBuffer.allocate(1);
IntBuffer topB = IntBuffer.allocate(1);
IntBuffer rightB = IntBuffer.allocate(1);
IntBuffer bottomB = IntBuffer.allocate(1);
TessAPI1.TessPageIteratorBoundingBox(pi, TessPageIteratorLevel.RIL_WORD, leftB, topB, rightB, bottomB);
int left = leftB.get();
int top = topB.get();
int right = rightB.get();
int bottom = bottomB.get();
Upvotes: 1