How to extract several rotated shapes from an image?

Question

I had taken an online visual IQ test, in it a lot of questions are like the following:

The addresses of the images are:

[f"https://www.idrlabs.com/static/i/eysenck-iq/en/{i}.png" for i in range(1, 51)]

In these images there are several shapes that are almost identical and of nearly the same size. Most of these shapes can be obtained from the others by rotation and translation, but there is exactly one shape that can only be obtained from the others with reflection, this shape has a different chirality from the others, and it is "the odd man". The task is to find it.

The answers here are 2, 1, and 4, respectively. I would like to automate it.

And I nearly succeeded.

First, I download the image, and load it using cv2.

Then I threshold the image and invert the values, and then find the contours. I then find the largest contours.

Now I need to extract the shapes associated with the contours and make the shapes stand upright. And this is where I stuck, I nearly succeeded but there are edge cases.

My idea is simple, find the minimal area bounding box of the contour, then rotate the image to make the rectangle upright (all sides are parallel to grid-lines, longest sides are vertical), and then calculate the new coordinates of the rectangle, and finally using array slicing to extract the shape.

I have achieved what I have described:

import cv2
import requests
import numpy as np

img = cv2.imdecode(
    np.asarray(
        bytearray(
            requests.get(
                "https://www.idrlabs.com/static/i/eysenck-iq/en/5.png"
            ).content,
        ),
        dtype=np.uint8,
    ),
    -1,
)


def get_contours(image):
    gray = cv2.cvtColor(image, cv2.COLOR_BGR2GRAY)
    _, thresh = cv2.threshold(gray, 128, 255, 0)
    thresh = ~thresh
    contours, _ = cv2.findContours(thresh, cv2.RETR_EXTERNAL, cv2.CHAIN_APPROX_SIMPLE)
    return contours


def find_largest_areas(contours):
    areas = [cv2.contourArea(contour) for contour in contours]
    area_ranks = [(area, i) for i, area in enumerate(areas)]
    area_ranks.sort(key=lambda x: -x[0])
    for i in range(1, len(area_ranks)):
        avg = sum(e[0] for e in area_ranks[:i]) / i
        if area_ranks[i][0] < avg * 0.95:
            break

    return {e[1] for e in area_ranks[:i]}


def find_largest_shapes(image):
    contours = get_contours(image)
    area_ranks = find_largest_areas(contours)
    contours = [e for i, e in enumerate(contours) if i in area_ranks]
    rectangles = [cv2.minAreaRect(contour) for contour in contours]
    rectangles.sort(key=lambda x: x[0])
    return rectangles


def rotate_image(image, angle):
    size_reverse = np.array(image.shape[1::-1])
    M = cv2.getRotationMatrix2D(tuple(size_reverse / 2.0), angle, 1.0)
    MM = np.absolute(M[:, :2])
    size_new = MM @ size_reverse
    M[:, -1] += (size_new - size_reverse) / 2.0
    return cv2.warpAffine(image, M, tuple(size_new.astype(int)))


def int_sort(arr):
    return np.sort(np.intp(np.floor(arr + 0.5)))


RADIANS = {}


def rotate(x, y, angle):
    if pair := RADIANS.get(angle):
        cosa, sina = pair
    else:
        a = angle / 180 * np.pi
        cosa, sina = np.cos(a), np.sin(a)
        RADIANS[angle] = (cosa, sina)

    return x * cosa - y * sina, y * cosa + x * sina


def new_border(x, y, angle):
    nx, ny = rotate(x, y, angle)
    nx = int_sort(nx)
    ny = int_sort(ny)
    return nx[3] - nx[0], ny[3] - ny[0]


def coords_to_pixels(x, y, w, h):
    cx, cy = w / 2, h / 2
    nx, ny = x + cx, cy - y
    nx, ny = int_sort(nx), int_sort(ny)
    a, b = nx[0], ny[0]
    return a, b, nx[3] - a, ny[3] - b


def new_contour_bounds(pixels, w, h, angle):
    cx, cy = w / 2, h / 2
    x = np.array([-cx, cx, cx, -cx])
    y = np.array([cy, cy, -cy, -cy])
    nw, nh = new_border(x, y, angle)
    bx, by = pixels[..., 0] - cx, cy - pixels[..., 1]
    nx, ny = rotate(bx, by, angle)
    return coords_to_pixels(nx, ny, nw, nh)


def extract_shape(rectangle, image):
    box = np.intp(np.floor(cv2.boxPoints(rectangle) + 0.5))
    h, w = image.shape[:2]
    angle = -rectangle[2]
    x, y, dx, dy = new_contour_bounds(box, w, h, angle)
    image = rotate_image(image, angle)
    shape = image[y : y + dy, x : x + dx]
    sh, sw = shape.shape[:2]
    if sh < sw:
        shape = np.rot90(shape)

    return shape


rectangles = find_largest_shapes(img)
for rectangle in rectangles:
    shape = extract_shape(rectangle, img)
    cv2.imshow("", shape)
    cv2.waitKeyEx(0)

But it doesn't work perfectly:

As you can see, it includes everything in the bounding rectangle, not just the main shape in bounded by the contour, there are some extra bits sticking in. I want the shape to contain only areas bound by the contour.

And then, the more serious problem, somehow the bounding box doesn't always align with the principal axis of the contour, as you can see in the last image it doesn't stand upright and there are black areas.

How to fix these problems?

stateMachine · Accepted Answer

This is my take on the problem. It basically involves working with the contour themselves, instead of the actual raster images. Use the Hu moments as shape descriptors, you can compute moments on the array of points directly. Get two vector/arrays: One "objective/reference" contour and compare it amongst the "target" contours, looking for maximum (Euclidean) distance between the two arrays. The contour that produces the maximum distance is the mismatched contour.

Keep in mind that the contours are stored un-ordered, this means that the objective contour (contour number 0 – the reference) might be the mismatched shape since the beginning. In this case we already have "maximum distance" between this contour and the rest, we need have to handle this case. Hint: check the distribution of the distances. If there’s a real maximum distance, this will be larger than the rest and might be identified as an outlier.

Let’s check the code. The first step is what you already have – look for target contours applying a filter area. Let’s keep the largest contours:

import numpy as np
import cv2
import math

# Set image path
directoryPath = "D://opencvImages//shapes//"

imageNames = ["01", "02", "03", "04", "05"]

# Loop through the image file names:
for imageName in imageNames:

    # Set image path:
    imagePath = directoryPath + imageName + ".png"

    # Load image:
    inputImage = cv2.imread(imagePath)

    # To grayscale:
    grayImage = cv2.cvtColor(inputImage, cv2.COLOR_BGR2GRAY)

    # Otsu:
    binaryImage = cv2.threshold(grayImage, 0, 255, cv2.THRESH_OTSU + cv2.THRESH_BINARY_INV)[1]

    # Contour list:
    # Store here all contours of interest (large area):
    contourList = []

    # Find the contour on the binary image:
    contours, hierarchy = cv2.findContours(binaryImage, cv2.RETR_EXTERNAL, cv2.CHAIN_APPROX_SIMPLE)

    for i, c in enumerate(contours):

        # Get blob area:
        currentArea = cv2.contourArea(c)

        # Set min area:
        minArea = 1000

        if currentArea > minArea:
            # Approximate the contour to a polygon:
            contoursPoly = cv2.approxPolyDP(c, 3, True)

            # Get the polygon's bounding rectangle:
            boundRect = cv2.boundingRect(contoursPoly)

            # Get contour centroid:
            cx = int(int(boundRect[0]) + 0.5 * int(boundRect[2]))
            cy = int(int(boundRect[1]) + 0.5 * int(boundRect[3]))

            # Store in dict:
            contourDict = {"Contour": c, "Rectangle": tuple(boundRect), "Centroid": (cx, cy)}

            # Into the list:
            contourList.append(contourDict)

Until this point, I’ve filtered all contour above a minimum area. I’ve stored the following information in a dictionary: The contour itself (the array of points), its bounding rectangle and its centroid. The later two will come handy while checking out results.

Now, let’s compute hu moments and their distances. I’ll set the first contour at index 0 as objective/reference and the rest as targets. The main takeaway from this is that we are looking for maximal distance between the reference’s hu moments array and the targets – that’s the one that identifies the mismatched shaped. One note of caution, though, the scale of the feature moments varies wildly. Whenever you are comparing things for similarity is advisable to have all the features scaled in the same range. In this particular case I’ll apply a power transform (log) to all features to scale them down:

    # Get total contours in the list:
    totalContours = len(contourList)

    # Deep copies of input image for results:
    inputCopy = inputImage.copy()
    contourCopy = inputImage.copy()

    # Set contour 0 as objetive:
    currentDict = contourList[0]
    # Get objective contour:
    objectiveContour = currentDict["Contour"]

    # Draw objective contour in green:
    cv2.drawContours(contourCopy, [objectiveContour], 0, (0, 255, 0), 3)

    # Draw contour index on image:
    center = currentDict["Centroid"]
    cv2.putText(contourCopy, "0", center, cv2.FONT_HERSHEY_SIMPLEX, 1, (255, 0, 0), 2)

    # Store contour distances here:
    contourDistances = []

    # Calculate log-scaled hu moments of objective contour:
    huMomentsObjective = getScaledMoments(objectiveContour)

    # Start from objectiveContour+1, get target contour, compute scaled moments and
    # get Euclidean distance between the two scaled arrays:

    for i in range(1, totalContours):
        # Set target contour:
        currentDict = contourList[i]
        # Get contour:
        targetContour = currentDict["Contour"]

        # Draw target contour in red:
        cv2.drawContours(contourCopy, [targetContour], 0, (0, 0, 255), 3)

        # Calculate log-scaled hu moments of target contour:
        huMomentsTarget = getScaledMoments(targetContour)

        # Compute Euclidean distance between the two arrays:
        contourDistance = np.linalg.norm(np.transpose(huMomentsObjective) - np.transpose(huMomentsTarget))
        print("contourDistance:", contourDistance)

        # Store distance along contour index in distance list:
        contourDistances.append([contourDistance, i])

        # Draw contour index on image:
        center = currentDict["Centroid"]
        cv2.putText(contourCopy, str(i), center, cv2.FONT_HERSHEY_SIMPLEX, 1, (255, 0, 0), 2)

        # Show processed contours:
        cv2.imshow("Contours", contourCopy)
        cv2.waitKey(0)

That’s a big snippet. A couple of things: I’ve defined the function getScaledMoments to compute the scaled array of hu moments, I show this function at the end of the post. From here, I’m storing the distance between the objective contour and the target in the contourDistances list, along the index of the original target contour. This will later help me identify the mismatched contour.

I’m also using the centroids to label each contour. You can see the process of distance calculation in the following animation; The reference contour is drawn in green, while each target gets drawn in red:

Next, let’s get the maximum distance. We have to handle the case when the reference contour was the mismatched shape all along. For this, I’ll apply a very crude outlier detector. The idea is that a distance from the contourDistances list has to be larger than the rest, while the rest are more or less the same. The key word here is variation. I’ll use the standard deviation of the distance list and look for maximum distance only if the standard deviation is above one sigma:

    # Get maximum distance,
    # List to numpy array:
    distanceArray = np.array(contourDistances)

    # Get distance mean and std dev:
    mean = np.mean(distanceArray[:, 0:1])
    stdDev = np.std(distanceArray[:, 0:1])

    print("M:", mean, "Std:", stdDev)

    # Set contour 0 (default) as the contour that is different from the rest:
    contourIndex = 0
    # Sigma minimum threshold:
    minSigma = 1.0

    # If std dev from the distance array is above a minimum variation,
    # there's an outlier (max distance) in the array, thus, the real different
    # contour we are looking for:

    if stdDev > minSigma:
        # Get max distance:
        maxDistance = np.max(distanceArray[:, 0:1])
        # Set contour index (contour at index 0 was the objective!):
        contourIndex = np.argmax(distanceArray[:, 0:1]) + 1
        print("Max:", maxDistance, "Index:", contourIndex)

    # Fetch dissimilar contour, if found,
    # Get boundingRect:
    boundingRect = contourList[contourIndex]["Rectangle"]

    # Get the dimensions of the bounding rect:
    rectX = boundingRect[0]
    rectY = boundingRect[1]
    rectWidth = boundingRect[2]
    rectHeight = boundingRect[3]

    # Draw dissimilar (mismatched) contour in red:
    color = (0, 0, 255)
    cv2.rectangle(inputCopy, (int(rectX), int(rectY)),
              (int(rectX + rectWidth), int(rectY + rectHeight)), color, 2)
    cv2.imshow("Mismatch", inputCopy)
    cv2.waitKey(0)

At the end, I just use the bounding rectangle I stored at the beginning of the script to identify the mismatched shape:

This is the getScaledMoments function:

def getScaledMoments(inputContour):
    """Computes log-scaled hu moments of a contour array"""
    # Calculate Moments
    moments = cv2.moments(inputContour)
    # Calculate Hu Moments
    huMoments = cv2.HuMoments(moments)

    # Log scale hu moments
    for m in range(0, 7):
        huMoments[m] = -1 * math.copysign(1.0, huMoments[m]) * math.log10(abs(huMoments[m]))

    return huMoments

Here are some results:

The following result is artificial. I manually modified the image to have a case where the reference contour (the one at index = 0) is the mismatched contour, so there's no need to look for a maximum distance (since we already have the result):

How to extract several rotated shapes from an image?

Answers (1)

Related Questions