user19213041
user19213041

Reputation: 11

Reskewing GCP Document AI Result

GCP's Document AI is pre-processing images to remove things like skew. The bounding boxes it produces correspond to the pre-processed image, not the image sent to the API. I need to reskew them so that they correspond to the original image. I am able to bring the bounding boxes back to their original rotation, but I can't figure out how to rescale/adjust them for the crop that GCP added.

Below are links to the original image, processed image, and what the final bounding boxes look like on the page. As you can see, skew is being corrected for, but the boxes are not aligning with the words on the original image. Notably, you'll see that the title is 15% lower on the page than it should be.

Original image:

original image

gcp pre-processed image:

gcp pre-processed image

Text output:

text output

The API returns a transformation matrix:

[{'rows': 2, 'cols': 3, 'type': 6, 'data': 'r6uHAEEXyL/OYqWImW3vP2QKzkdFXH9AzmKliJlt77+vq4cAQRfIv5yIBglXqaZA'}]

I decoded it into a skew angle of -100.84832000732422 using

rows = matrix_data[0]["rows"]
cols = matrix_data[0]["cols"]
data_type_code = matrix_data[0]["type"]
data_encoded = matrix_data[0]["data"]
data_binary = base64.b64decode(data_encoded)
matrix = np.frombuffer(data_binary, dtype=dtype).reshape((rows, cols))
rotation_radians = math.atan2(matrix[1, 0], matrix[0, 0])
skew_angle = math.degrees(rotation_radians)

Each bounding box is being parsed with this:

box = Box(root=\[Point(x=v.get("x", -1), y=v.get("y", -1)) for v in vertices\])
box = box.skew_box(self.skew_angle)
box = box.scale_box(self.scale_x, self.scale_y)

This is how the ratios are calculated:

# These are the ratios for original image sent to the API
x_ratio = x_len / y_len if y_len != 0 else 1
y_ratio = y_len / x_len if x_len != 0 else 1
longest_side = "x" if x_len > y_len else "y"

# These are the ratios from the GCP-processed image
gcp_x_ratio = gcp_height / gcp_width if gcp_width != 0 else 1
gcp_y_ratio = gcp_width / gcp_height if gcp_height != 0 else 1
gcp_longest_side = "x" if gcp_width > gcp_height else "y"

if longest_side != gcp_longest_side:
gcp_x_ratio, gcp_y_ratio = gcp_y_ratio, gcp_x_ratio

self.scale_x = x_ratio / gcp_x_ratio
self.scale_y = y_ratio / gcp_y_ratio

This is how the scale is calculated

def scale_box(self, x: float, y: float, origin: Point = Point(x=0.5, y=0.5)):
    def scale_point_about_origin(point: Point, origin: Point) -> Point:
        translated_x = point.x - origin.x
        translated_y = point.y - origin.y
        scaled_x = translated_x * x
        scaled_y = translated_y * y
        return Point(x=scaled_x + origin.x, y=scaled_y + origin.y)

    return Box(root=[scale_point_about_origin(p, origin) for p in self.root])

For reference, the bounding boxes are being reskewed using the following code:

def skew_box(self, angle: float, width: float = 1, height: float = 1):
    if angle == 0:
        return self

    theta = math.radians(-angle)
    cos_theta = np.cos(theta)
    sin_theta = np.sin(theta)
    origin_x = width / 2
    origin_y = height / 2

    def skew_point(point: Point) -> Point:
        x, y = point.x - origin_x, point.y - origin_y
        x_rotated = x * cos_theta - y * sin_theta
        y_rotated = x * sin_theta + y * cos_theta
        x_rotated, y_rotated = x_rotated + origin_x, y_rotated + origin_y

        return Point(x=x_rotated, y=y_rotated)

    return Box([skew_point(p) for p in self.root])

Upvotes: 1

Views: 168

Answers (1)

ROSSA AI
ROSSA AI

Reputation: 11

x_len,y_len are from original image size and gcp_width, gcp_height are from page.image.width, page.image.width

    transform = page.transforms[0]
    data_encoded = transform.data
    data_binary = data_encoded
    matrix = np.frombuffer(data_binary, dtype=np.float64).reshape(
        transform.rows, transform.cols
    )

    a, b, tx = matrix[0]
    c, d, ty = matrix[1]

    rotation_radians = math.atan2(b, a)
    skew_angle = math.degrees(rotation_radians)

    # Calcular escala
    scale_x = math.sqrt(a * a + b * b)
    scale_y = math.sqrt(c * c + d * d)

    (...)
    
    gcp_width = page.image.width
    gcp_height = page.image.height
    x_len, y_len = self.original_width, self.original_height

    # Convert vertices to a Box
    box = Box(root=[Point(x=v.x, y=v.y) for v in normalized_vertices])

    # Apply skew correction
    box = box.skew_box(-skew_angle)

    # Apply scaling
    # x_len, y_len are from original image size
    scale_factor_x = gcp_width / x_len
    scale_factor_y = gcp_height / y_len
    box = box.scale_box(scale_factor_x / scale_x, scale_factor_y / scale_y)

Upvotes: 1

Related Questions