jonah_w
jonah_w

Reputation: 1032

DocumentAI detect if image contains non-text visual elements in it

Most of my target images contain only text elements, which is expected, since my main purpose is to extract text from them. But some of the target images contain non-text visual elements (actual images within the document), I'd like to know which of them are like this.

Does DocumentAI have a way to do that?

I have tried to detect the image by checking the areas of the blocks of a page object in DocumentAI using Python:

def has_visual_elements(page):
    """Checks if a page likely contains non-text visual elements."""
    for block in page.blocks:
        if block.layout:
            layout = block.layout.bounding_poly
            # Calculate the area of the bounding box
            width = layout.vertices[2].x - layout.vertices[0].x
            height = layout.vertices[2].y - layout.vertices[0].y
            area = abs(width * height)

            if area > 10000:
                return True
    return False

If the area is bigger than certain value, then there may be non-text visual elements in it. But some images containing only text elements return big area value. So this couldn't solve it.

An image containing non-text visual elements in it: enter image description here

Upvotes: 0

Views: 56

Answers (1)

jggp1094
jggp1094

Reputation: 180

Document AI focuses on extracting textual content, not explicitly marking the presence of non-text visual elements within its standard text output formats.

If your goal is to identify non-text visual elements, I think the better way to do that is by using Vision API Object Localization. Each LocalizedObjectAnnotation identifies information about the object, the position of the object, and rectangular bounds for the region of the image that contains the object.

Simply follow these steps on how to set up your Vision API.

Upvotes: 0

Related Questions