Reputation: 61
I want to draw a bounding box on all the question and on the respective options of that question,then I want to extract text from each of them and put into a pandas dataframe which will be exported to excel later. For this I have a python file that detects the four options [(a),(b),(c),(d)] and the question .But the problem here is when I extract the whole image(Without any bounding box) PyTesseract gives me a desired output but when I try to extract it from the bouding boxes it makes a lot of error in text detection. I've attached my python file below. Can someone tell me how to correctly detect text from these bounding boxes correctly ?
Python Code:
# read the image using OpenCV
image = cv2.imread("E:\PythonTarget.jpg")
# make a copy of this image to draw in
image_copy = image.copy()
# the target word to search for
target_word_a = "(a)"
target_word_b = "(b)"
target_word_c = "(c)"
target_word_d = "(d)"
# get all data from the image
data = tess.image_to_data(image, output_type=tess.Output.DICT)
# get all occurences of the that word
word_occurences_a = [i for i, word in enumerate(data["text"]) if word.lower() == target_word_a]
word_occurences_b = [i for i, word in enumerate(data["text"]) if word.lower() == target_word_b]
word_occurences_c = [i for i, word in enumerate(data["text"]) if word.lower() == target_word_c]
word_occurences_d = [i for i, word in enumerate(data["text"]) if word.lower() == target_word_d]
for occ in word_occurences_a:
# extract the width, height, top and left position for that detected word
w = data["width"][occ] + 1000
h = data["height"][occ]
l = data["left"][occ]
t = data["top"][occ]
# define all the surrounding box points
p1 = (l, t)
p2 = (l + w, t)
p3 = (l + w, t + h)
p4 = (l, t + h)
# draw the 4 lines (rectangular)
image_copy = cv2.line(image_copy, p1, p2, color=(255, 0, 0), thickness=4)
image_copy = cv2.line(image_copy, p2, p3, color=(255, 0, 0), thickness=4)
image_copy = cv2.line(image_copy, p3, p4, color=(255, 0, 0), thickness=4)
image_copy = cv2.line(image_copy, p4, p1, color=(255, 0, 0), thickness=4)
#Turn the bounding box to a cv2 image
crop = image_copy[t: t + h, l:l + w]
#Extract text from the cv2 image
results = tess.image_to_string(crop)
#print the extracted text
print(results)
Upvotes: 1
Views: 1817
Reputation: 8005
You could use image_to_data
to draw the bounding boxes. For instance:
You should try using page-segmentation-modes(psm
)
For instance, if you set psm
to 6, assuming the image a single uniform block of text:
Code:
# Load the libraries
import cv2
import pytesseract
# Load the image
img = cv2.imread("Uewxg.jpg")
# Convert it to the gray-scale
gry = cv2.cvtColor(img, cv2.COLOR_BGR2GRAY)
# OCR detection
d = pytesseract.image_to_data(gry, config="--psm 6", output_type=pytesseract.Output.DICT)
# Get ROI part from the detection
n_boxes = len(d['level'])
# For each detected part
for i in range(n_boxes):
# Get the localized region
(x, y, w, h) = (d['left'][i], d['top'][i], d['width'][i], d['height'][i])
# Draw rectangle to the detected region
cv2.rectangle(img, (x, y), (x+w, y+h), (0, 0, 255), 5)
# Crop the image
crp = gry[y:y+h, x:x+w]
# OCR
txt = pytesseract.image_to_string(crp, config="--psm 6")
print(txt)
# Display the cropped image
cv2.imshow("crp", crp)
cv2.waitKey(0)
# Display
cv2.imshow("img", img)
cv2.waitKey(0)
Upvotes: 2