Reputation: 1032
Most of my target images contain only text elements, which is expected, since my main purpose is to extract text from them. But some of the target images contain non-text visual elements (actual images within the document), I'd like to know which of them are like this.
Does DocumentAI have a way to do that?
I have tried to detect the image by checking the areas of the block
s of a page
object in DocumentAI
using Python:
def has_visual_elements(page):
"""Checks if a page likely contains non-text visual elements."""
for block in page.blocks:
if block.layout:
layout = block.layout.bounding_poly
# Calculate the area of the bounding box
width = layout.vertices[2].x - layout.vertices[0].x
height = layout.vertices[2].y - layout.vertices[0].y
area = abs(width * height)
if area > 10000:
return True
return False
If the area
is bigger than certain value, then there may be non-text visual elements in it. But some images containing only text elements return big area
value. So this couldn't solve it.
An image containing non-text visual elements in it:
Upvotes: 0
Views: 56
Reputation: 180
Document AI focuses on extracting textual content, not explicitly marking the presence of non-text visual elements within its standard text output formats.
If your goal is to identify non-text visual elements, I think the better way to do that is by using Vision API Object Localization. Each LocalizedObjectAnnotation
identifies information about the object, the position of the object, and rectangular bounds for the region of the image that contains the object.
Simply follow these steps on how to set up your Vision API.
Upvotes: 0