Reputation: 1
I'm trying to use tesseract and opencv in Python to extract every character from an image and save each character to an individual image file. My code has no problem recognizing the text properly and printing it out, but it's not recognizing the position and size of the individual characters properly. Here's the input image:
https://i.sstatic.net/fYYlu.png
Here's my code:
#=Imports======================================================================
import cv2
import sys
import pytesseract
pytesseract.pytesseract.tesseract_cmd = r'C:\Users\User\AppData\Local\Tesseract-OCR\tesseract.exe'
import math
from PIL import ImageGrab
#=Main=Code====================================================================
#Read in image
img = cv2.imread("feldman.png")
#Processing to make the image suitable for OCR
img = cv2.cvtColor(img, cv2.COLOR_BGR2GRAY) #Convert image to greyscale
img = cv2.threshold(img, 190, 255, cv2.THRESH_BINARY)[1] #Apply threshold effect
#Perform OCR and print to command line
print("Output from image_to_string():")
print(pytesseract.image_to_string(img))
#Save each character as an image
print("")
print("First character of each line from the output of image_to_boxes():")
hImg, wImg = img.shape #Get the dimensions of the image
boxes = pytesseract.image_to_boxes(img) #Analyzes where boxes would be drawn around each character in an image and creates a string with many lines, one line per box, each line containing data about its box. The data structure for each line/box is: character x1 y1 x2 y2 0 (not sure what the last one is but it's always 0), for example: s 596 164 609 181 0
ROI_number=0 #ROI = "region of interest", it's basically just the index for which character we're on
for b in boxes.splitlines(): #For every line in the string created by image_to_boxes()...
b = b.split(' ') #Split the line into a list of strings, each string is a separate piece of data. So now, b[0] is character, b[1] is x1, b[2] is y1, b[3] is x2, b[4] is y2, and b[5] is 0
char, x, y, w, h = b[0], int(b[1]), int(b[2]), int(b[3]), int(b[4]) #Store the pieces of data in variables with names that make sense (see comment in above line)
print(char, end="") #Print out each character recongnized by image_to_boxes()
x1,y1=hImg-h,hImg-y
x2,y2=x,w
roi=img[x1:y1,x2:y2]
cv2.imwrite("charimages/"+str(ROI_number)+".jpeg",roi) #Save an image file for the character
ROI_number+=1
Here is the output to the command line (which almost perfectly correct):
Output from image_to_string():
FPT ISBN 0-688-05913-4 >$22.95
IMPONDERABLES
The Solution to the
Mysteries of Everyday Life
David Feldman
Illustrated by Kas Schwan
Did you ever wonder why you never
see baby pigeons? Or why a thumbs-up
gesture means “OK”? At last the solu-
tions to some of life’s most baffling
questions are gathered here in one
volume. Written in an informative
and entertaining style and illustrated
with drawings that are clearly to the
point, Imponderables gets to the bottom
of everyday life’s mysteries, among
them:
* Why is a mile 5,280 feet?
* Which fruits are in Juicy Fruit*®
gum?
* Why does an X stand for a kiss?
* Why don’t cats like to swim?
* Why do other people hear our
voices differently than we do?
Dictionaries, encyclopedias, and
almanacs don’t have the answers—
Imponderables does! And in answering
such questions, it touches on an aston-
ishing variety of subjects, including
(continued on back flap)
First character of each line from the output of image_to_boxes():
FPTISBN0-688-05913-4>$22.95IMPONDERABLESTheSolutiontotheMysteriesofEverydayLifeDavidFeldmanIllustratedbyKasSchwan~Didyoueverwonderwhyyouneverseebabypigeons?Orwhyathumbs-upgesturemeans“OK”?Atlastthesolu-tionstosomeoflife’smostbafflingquestionsaregatheredhereinonevolume.Writteninaninformativeandentertainingstyleandillustratedwithdrawingsthatareclearlytothepoint,Imponderablesgetstothebottomofeverydaylife’smysteries,amongthem:*Whyisamile5,280feet?*WhichfruitsareinJuicyFruit*®gum?*WhydoesanXstandforakiss?*Whydon’tcatsliketoswim?*Whydootherpeoplehearourvoicesdifferentlythanwedo?Dictionaries,encyclopedias,andalmanacsdon’thavetheanswers—Imponderablesdoes!Andinansweringsuchquestions,ittouchesonanaston-ishingvarietyofsubjects,including(continuedonbackflap)~
But when it comes to the output image files, a lot of them are wrong. Some of the images are correct, but a lot of them are just... messed up. Take the image files corresponding to the word "IMPONDERABLES" as an example. There are 13 files, 1 for each character, which makes perfect sense. However, some of the images contain multiple characters:
https://i.sstatic.net/1QtKG.png
As far as I can tell, the problem originates with pytesseract.image_to_boxes(), which recognizes each character correctly but somehow doesn't recognize it's position and size correctly. Is there something I can do to make image_to_boxes() more accurate, or is there a different solution entirely?
Upvotes: 0
Views: 2919
Reputation: 5815
Here's a purely OpenCV-based solution. You can produce bounding rectangles
enclosing each character, the tricky part is to successfully and clearly segment each character. Image resolution is crucial for this, your image is quite small, and you can see at that DPI
some characters appear to be joined. That's the issue you are facing. Adaptive Thresholding seems to somewhat alleviate the issue. Again, resolution is crucial and you would benefit from high-res images.
These are the steps:
2X
your input image, because, again, it is pretty smallnumpy
slicingLet's see the code:
# Imports:
import numpy as np
import cv2
# Image path
path = "D://opencvImages//"
fileName = "fYYlu.png"
# Reading an image in default mode:
inputImage = cv2.imread(path + fileName)
# Scale image:
scaleFactor = 2
inputImage = cv2.resize(inputImage, None, fx=scaleFactor, fy=scaleFactor, interpolation=cv2.INTER_LINEAR)
# Deep Copy:
inputImageCopy = inputImage.copy()
# Convert RGB to grayscale:
grayscaleImage = cv2.cvtColor(inputImage, cv2.COLOR_BGR2GRAY)
# Adapt. threshold:
windowSize = 41
constantValue = 8
binaryImage = cv2.adaptiveThreshold(grayscaleImage, 255, cv2.ADAPTIVE_THRESH_GAUSSIAN_C, cv2.THRESH_BINARY_INV,
windowSize, constantValue)
So far I've managed to get this binary mask:
Which is pretty good, there is some noise and some characters are apparently joined, but let's work with this. Next, let's detect contours and apply a blob filter to ignore noise. The noise seems very small or very large. Let's set a lower and upper threshold to ignore those values. Additionally, the characters seem to be almost square-like, in the sense that their width/height ratio seems pretty close to 1.0
:
# Find the EXTERNAL contours on the binary image:
contours, _ = cv2.findContours(binaryImage, cv2.RETR_EXTERNAL, cv2.CHAIN_APPROX_SIMPLE)
# Look for the outer bounding boxes (no children):
for _, c in enumerate(contours):
# Get the bounding rectangle:
boundRect = cv2.boundingRect(c)
# Draw the rectangle on the input image:
# Get the dimensions of the bounding rect:
rectX = int(boundRect[0])
rectY = int(boundRect[1])
rectWidth = int(boundRect[2])
rectHeight = int(boundRect[3])
# Compute contour area:
contourArea = rectHeight * rectWidth
# Compute aspect ratio:
referenceRatio = 1.0
contourRatio = rectWidth / rectHeight
epsilon = 1.1
ratioDifference = abs(referenceRatio - contourRatio)
print((ratioDifference, contourArea))
# Red color, filtered blobs:
color = (0, 0, 255)
# Apply contour filter:
if ratioDifference <= epsilon: # Aspect Ratio
minArea = 50 * scaleFactor
maxArea = 120 * minArea
if minArea <= contourArea < maxArea: # Area Filter
# Crop contour:
croppedChar = inputImage[rectY:rectY + rectHeight, rectX:rectX + rectWidth]
cv2.imshow("Cropped Character", croppedChar)
cv2.waitKey(0)
# Green Color, detected blobs:
color = (0, 255, 0)
cv2.rectangle(inputImageCopy, (int(rectX), int(rectY)),
(int(rectX + rectWidth), int(rectY + rectHeight)), color, 2)
# (Optional) Show image:
cv2.imshow("Bounding Rectangle", inputImageCopy)
cv2.waitKey(0)
This image shows in green the "valid" character boxes, and in red the filtered ones:
You can tune out the results by fiddling with the adaptive threshold parameters, this GIF
shows various results for ascending, odd, Window Size (WS) values. The higher the window size the better - up until a certain point in which characters will start joining in bigger clusters:
Upvotes: 1