Reputation: 51
Is there any way to read the image line by line with tesseract and get the coordinates of lines? Normally I can read each word tesseract returns dictionary and I can get all positions but no option for the line coordinates? I'm using psm 6 to read line by line but even I use it I receive word's coordinates
d = pytesseract.image_to_data(img, lang="eng", output_type=Output.DICT)
Upvotes: 1
Views: 3882
Reputation: 2854
You can group together the words belonging to each line and find the lines bounding box from left-most and right-most words bounding box. Below is the python implementation to group the words together for each line.
text = pytesseract.image_to_data(img, lang="eng", output_type=Output.DICT)
data = {}
for i in range(len(text['line_num'])):
txt = text['text'][i]
block_num = text['block_num'][i]
line_num = text['line_num'][i]
top, left = text['top'][i], text['left'][i]
width, height = text['width'][i], text['height'][i]
if not (txt == '' or txt.isspace()):
tup = (txt, left, top, width, height)
if block_num in data:
if line_num in data[block_num]:
data[block_num][line_num].append(tup)
else:
data[block_num][line_num] = [tup]
else:
data[block_num] = {}
data[block_num][line_num] = [tup]
linedata = {}
idx = 0
for _, b in data.items():
for _, l in b.items():
linedata[idx] = l
idx += 1
line_idx = 1
for _, line in linedata.items():
xmin, ymin = line[0][1], line[0][2]
xmax, ymax = (line[-1][1] + line[-1][3]), (line[-1][2] + line[-1][4])
print("Line {} : {}, {}, {}, {}".format(line_idx, xmin, ymin, xmax, ymax))
line_idx += 1
Upvotes: 5