Reputation: 93
I am following https://github.com/mindee/doctr this GitHub repo to detect text I have converted the text coordinates in the absolute coordinate in [xmin, ymin, xmax, ymax]. I want to draw the bounding box using these values and cropped the image in the folder How can I do that
import json
from doctr.io import DocumentFile
from doctr.models import ocr_predictor
model = ocr_predictor(pretrained=True)
# PDF
doc = DocumentFile.from_images("/content/passbook_64_0.jpeg")
# Analyze
result = model(doc)
# Export results in json
with open("/content/preds.json", "w") as f:
json.dump(result.export(), f)
export = result.export()
# Flatten the export
page_words = [[word for block in page['blocks'] for line in block['lines'] for word in line['words']] for page in export['pages']]
page_dims = [page['dimensions'] for page in export['pages']]
# Get the coords in [xmin, ymin, xmax, ymax]
words_abs_coords = [
[[int(round(word['geometry'][0][0] * dims[0])), int(round(word['geometry'][0][1] * dims[1])), int(round(word['geometry'][1][0] * dims[0])), int(round(word['geometry'][1][1] * dims[1]))] for word in words]
for words, dims in zip(page_words, page_dims)
]
print(words_abs_coords)
The value of absolute coordinates obtained from the above code
[[[33, 108, 57, 135], [54, 107, 81, 136], [189, 110, 205, 141], [205, 112, 221, 141], [222, 114, 230, 141], [230, 112, 247, 141], [11, 173, 39, 196], [41, 175, 68, 196], [71, 175, 87, 198], [90, 177, 116, 198], [215, 179, 256, 199], [26, 204, 35, 225], [10, 203, 25, 227], [89, 204, 131, 228], [214, 207, 256, 227], [54, 228, 57, 236], [11, 225, 38, 246], [41, 224, 53, 247], [90, 225, 129, 245], [11, 244, 42, 265], [45, 245, 64, 267], [82, 246, 102, 267], [67, 246, 79, 268], [104, 247, 127, 268], [13, 301, 87, 324], [90, 303, 113, 323], [12, 327, 60, 349], [63, 331, 69, 347], [84, 331, 125, 349], [70, 328, 80, 351], [214, 334, 259, 356], [61, 360, 108, 378], [41, 357, 59, 382], [130, 360, 160, 381], [111, 359, 128, 382], [214, 362, 282, 386], [41, 388, 62, 411], [63, 388, 84, 411], [85, 388, 106, 411], [108, 387, 131, 410], [213, 392, 237, 415], [239, 393, 276, 418], [11, 415, 34, 439], [213, 419, 230, 444], [231, 419, 241, 444], [244, 422, 286, 447], [11, 443, 34, 467], [208, 441, 252, 477], [259, 451, 287, 476], [11, 474, 34, 497], [52, 471, 80, 496], [38, 470, 51, 498], [215, 478, 274, 501], [10, 501, 30, 525], [207, 505, 267, 531], [49, 531, 123, 555], [11, 536, 27, 559], [29, 534, 46, 562], [204, 536, 233, 560], [234, 538, 259, 562]]]
import matplotlib.pyplot as plt
import cv2
image = cv2.imread("/content/passbook_82_0.jpeg")
im_height, im_width, _ = image.shape
xmin=words_abs_coords[0][0][0]
ymin=words_abs_coords[0][0][1]
xmax=words_abs_coords[0][0][2]
ymax=words_abs_coords[0][0][3]
image1 = cv2.rectangle(image, (xmin,ymin), (xmax,ymax), (0,255,0), 2)
plt.imshow(image1)
Upvotes: 0
Views: 1371
Reputation: 76
For anyone finding this thread, I believe the answer was already provided on the dedicated GitHub discussion over there: https://github.com/mindee/doctr/discussions/570
I think the only part to change in the snippet is this:
words_abs_coords = [
[[int(round(word['geometry'][0][0] * dims[0])), int(round(word['geometry'][0][1] * dims[1])), int(round(word['geometry'][1][0] * dims[0])), int(round(word['geometry'][1][1] * dims[1]))] for word in words]
for words, dims in zip(page_words, page_dims)
]
The page dimensions order is wrongly used, as pointed out in the discussion, changing it to:
words_abs_coords = [
[[int(round(word['geometry'][0][0] * dims[1])), int(round(word['geometry'][0][1] * dims[0])), int(round(word['geometry'][1][0] * dims[1])), int(round(word['geometry'][1][1] * dims[0]))] for word in words]
for words, dims in zip(page_words, page_dims)
]
should solve your problem :)
Cheers!
Upvotes: 1