Reputation: 11
GCP's Document AI is pre-processing images to remove things like skew. The bounding boxes it produces correspond to the pre-processed image, not the image sent to the API. I need to reskew them so that they correspond to the original image. I am able to bring the bounding boxes back to their original rotation, but I can't figure out how to rescale/adjust them for the crop that GCP added.
Below are links to the original image, processed image, and what the final bounding boxes look like on the page. As you can see, skew is being corrected for, but the boxes are not aligning with the words on the original image. Notably, you'll see that the title is 15% lower on the page than it should be.
Original image:
gcp pre-processed image:
Text output:
The API returns a transformation matrix:
[{'rows': 2, 'cols': 3, 'type': 6, 'data': 'r6uHAEEXyL/OYqWImW3vP2QKzkdFXH9AzmKliJlt77+vq4cAQRfIv5yIBglXqaZA'}]
I decoded it into a skew angle of -100.84832000732422 using
rows = matrix_data[0]["rows"]
cols = matrix_data[0]["cols"]
data_type_code = matrix_data[0]["type"]
data_encoded = matrix_data[0]["data"]
data_binary = base64.b64decode(data_encoded)
matrix = np.frombuffer(data_binary, dtype=dtype).reshape((rows, cols))
rotation_radians = math.atan2(matrix[1, 0], matrix[0, 0])
skew_angle = math.degrees(rotation_radians)
Each bounding box is being parsed with this:
box = Box(root=\[Point(x=v.get("x", -1), y=v.get("y", -1)) for v in vertices\])
box = box.skew_box(self.skew_angle)
box = box.scale_box(self.scale_x, self.scale_y)
This is how the ratios are calculated:
# These are the ratios for original image sent to the API
x_ratio = x_len / y_len if y_len != 0 else 1
y_ratio = y_len / x_len if x_len != 0 else 1
longest_side = "x" if x_len > y_len else "y"
# These are the ratios from the GCP-processed image
gcp_x_ratio = gcp_height / gcp_width if gcp_width != 0 else 1
gcp_y_ratio = gcp_width / gcp_height if gcp_height != 0 else 1
gcp_longest_side = "x" if gcp_width > gcp_height else "y"
if longest_side != gcp_longest_side:
gcp_x_ratio, gcp_y_ratio = gcp_y_ratio, gcp_x_ratio
self.scale_x = x_ratio / gcp_x_ratio
self.scale_y = y_ratio / gcp_y_ratio
This is how the scale is calculated
def scale_box(self, x: float, y: float, origin: Point = Point(x=0.5, y=0.5)):
def scale_point_about_origin(point: Point, origin: Point) -> Point:
translated_x = point.x - origin.x
translated_y = point.y - origin.y
scaled_x = translated_x * x
scaled_y = translated_y * y
return Point(x=scaled_x + origin.x, y=scaled_y + origin.y)
return Box(root=[scale_point_about_origin(p, origin) for p in self.root])
For reference, the bounding boxes are being reskewed using the following code:
def skew_box(self, angle: float, width: float = 1, height: float = 1):
if angle == 0:
return self
theta = math.radians(-angle)
cos_theta = np.cos(theta)
sin_theta = np.sin(theta)
origin_x = width / 2
origin_y = height / 2
def skew_point(point: Point) -> Point:
x, y = point.x - origin_x, point.y - origin_y
x_rotated = x * cos_theta - y * sin_theta
y_rotated = x * sin_theta + y * cos_theta
x_rotated, y_rotated = x_rotated + origin_x, y_rotated + origin_y
return Point(x=x_rotated, y=y_rotated)
return Box([skew_point(p) for p in self.root])
Upvotes: 1
Views: 168
Reputation: 11
x_len,y_len are from original image size and gcp_width, gcp_height are from page.image.width, page.image.width
transform = page.transforms[0]
data_encoded = transform.data
data_binary = data_encoded
matrix = np.frombuffer(data_binary, dtype=np.float64).reshape(
transform.rows, transform.cols
)
a, b, tx = matrix[0]
c, d, ty = matrix[1]
rotation_radians = math.atan2(b, a)
skew_angle = math.degrees(rotation_radians)
# Calcular escala
scale_x = math.sqrt(a * a + b * b)
scale_y = math.sqrt(c * c + d * d)
(...)
gcp_width = page.image.width
gcp_height = page.image.height
x_len, y_len = self.original_width, self.original_height
# Convert vertices to a Box
box = Box(root=[Point(x=v.x, y=v.y) for v in normalized_vertices])
# Apply skew correction
box = box.skew_box(-skew_angle)
# Apply scaling
# x_len, y_len are from original image size
scale_factor_x = gcp_width / x_len
scale_factor_y = gcp_height / y_len
box = box.scale_box(scale_factor_x / scale_x, scale_factor_y / scale_y)
Upvotes: 1