atheesh
atheesh

Reputation: 23

Bounding boxes around characters for tesseract 4.0.0-beta.1

I am trying to do number plate recognition using tesseract 4.0.0-beta.1. In tesseract documentation, it is told to create box files in the form . I tried using "makebox" function. But, it is not detecting every character properly. Then, somewhere i read that this function is for version 3.x.

I later tried "wordstrbox" function. But the box file which is created in this way is empty. Can someone tell me how to create box files for tesseract 4.0.0-beta.1.

Upvotes: 1

Views: 8187

Answers (2)

Alex W.
Alex W.

Reputation: 174

I've found AlfyFaisy's answer very helpful and just wanted to share the code to view the bounding boxes of single characters. The differences regard the keys of the dictionary that is output by the image_to_boxes method:

import pytesseract
import cv2
from pytesseract import Output

img = cv2.imread('image.png')
height = img.shape[0]
width = img.shape[1]

d = pytesseract.image_to_boxes(img, output_type=Output.DICT)
n_boxes = len(d['char'])
for i in range(n_boxes):
    (text,x1,y2,x2,y1) = (d['char'][i],d['left'][i],d['top'][i],d['right'][i],d['bottom'][i])
    cv2.rectangle(img, (x1,height-y1), (x2,height-y2) , (0,255,0), 2)
cv2.imshow('img',img)
cv2.waitKey(0)

At least on my machine (Python 3.6.8, cv2 4.1.0) the cv2 method is waitKey(0) with a capital K.

This is the output I got:

output

Upvotes: 5

AlfiyaFaisy
AlfiyaFaisy

Reputation: 444

Use pytesseract.image_to_data()

import pytesseract
import cv2
from pytesseract import Output

img = cv2.imread('image.jpg')
d = pytesseract.image_to_data(img, output_type=Output.DICT)
n_boxes = len(d['level'])
for i in range(n_boxes):
    (text,x,y,w,h) = (d['text'][i],d['left'][i],d['top'][i],d['width'][i],d['height'][i])
    cv2.rectangle(img, (x,y), (x+w,y+h) , (0,255,0), 2)
cv2.imshow('img',img)
cv2.waitkey(0)

Among the data returned by pytesseract.image_to_data():

  • left is the distance from the upper-left corner of the bounding box, to the left border of the image.
  • top is the distance from the upper-left corner of the bounding box, to the top border of the image.
  • width and height are the width and height of the bounding box.
  • conf is the model's confidence for the prediction for the word within that bounding box. If conf is -1, that means that the corresponding bounding box contains a block of text, rather than just a single word.

The bounding boxes returned by pytesseract.image_to_boxes() enclose letters so I believe pytesseract.image_to_data() is what you're looking for.

Upvotes: 5

Related Questions