irastol
irastol

Reputation: 62

How to crop images using Pillow and pytesseract?

I was trying to use pytesseract to find the box positions of each letter in an image. I tried to use an image, and cropping it with Pillow and it worked, but when I tried with a lower character size image (example), the program may recognize the characters, but cropping the image with the box coordinates give me images like this. I also tried to double up the size of the original image, but it changed nothing.

img = Image.open('imgtest.png')
data=pytesseract.image_to_boxes(img)
dati= data.splitlines()
corde=[]
for i in dati[0].split()[1:5]: #just trying with the first character
    corde.append(int(i))
im=img.crop(tuple(corde))
im.save('cimg.png')

Upvotes: 1

Views: 1453

Answers (1)

HansHirse
HansHirse

Reputation: 18925

If we stick to the source code of image_to_boxes, we see, that the returned coordinates are in the following order:

left bottom right top

From the documentation on Image.crop, we see, that the expected order of coordinates is:

left upper right lower

Now, it also seems, that pytesseract iterates images from bottom to top. Therefore, we also need to further convert the top/upper and bottom/lower coordinates.

That'd be the reworked code:

from PIL import Image
import pytesseract

img = Image.open('MJwQi9f.png')
data = pytesseract.image_to_boxes(img)
dati = data.splitlines()
corde = []
for i in dati[0].split()[1:5]:
    corde.append(int(i))
corde = tuple([corde[0], img.size[1]-corde[3], corde[2], img.size[1]-corde[1]])
im = img.crop(tuple(corde))
im.save('cimg.png')

You see, left and right are in the same place, but top/upper and bottom/lower switched places, and where also altered w.r.t. the image height.

And, that's the updated output:

Output

The result isn't optimal, but I assume, that's due to the font.

----------------------------------------
System information
----------------------------------------
Platform:      Windows-10-10.0.16299-SP0
Python:        3.9.1
Pillow:        8.1.0
pytesseract:   4.00.00alpha
----------------------------------------

Upvotes: 1

Related Questions