Numba is 10x slower than python equivalent in task which it should be good at

Question

I have the following function:

def dewarp(image, destination_image, pixels, strength, zoom, pts, players):
    height = image.shape[0]
    width = image.shape[1]
    half_height = height / 2
    half_width = width / 2

    pts_transformed = np.empty((0, 2))
    players_transformed = np.empty((0, 2))

    correctionRadius = sqrt(width ** 2 + height ** 2) / strength

    for x_p, y_p in pixels:
        newX = x_p - half_width
        newY = y_p - half_height

        distance = sqrt(newX ** 2 + newY ** 2)
        r = distance / correctionRadius

        if r == 0:
            theta = 1
        else:
            theta = atan(r) / r

        sourceX = int(half_width + theta * newX * zoom)
        sourceY = int(half_height + theta * newY * zoom)

        if 0 < sourceX < width and 0 < sourceY < height:
            destination_image[y_p, x_p, :] = image[sourceY, sourceX, :]
            if (sourceX, sourceY) in pts:
                pts_transformed = np.vstack((pts_transformed, np.array([[x_p, y_p]])))
            if (sourceX, sourceY) in players:
                players_transformed = np.vstack((players_transformed, np.array([[x_p, y_p]])))

    return destination_image, pts_transformed, players_transformed

The arguments are: image and destination image: both a 3840x800x3 numpy array pixels are a list of pixel combinations, I've tried a double for loop too, but the result is the same strength and zoom are both floats pts and players are both python sets

The pure python version of this takes about 4 seconds, the numba version usually about 30 seconds. How is this possible?

I've used dewarp.inspect_types and numba appears to not be in object mode.

For convenience if you'd like to recreate the example, you can use this as image, destination image, pts and players and check for yourself:

pts = {(70, 667),
 (70, 668),
 (71, 667),
 (71, 668),
 (1169, 94),
 (1169, 95),
 (1170, 94),
 (1170, 95),
 (2699, 86),
 (2699, 87),
 (2700, 86),
 (2700, 87),
 (3794, 641),
 (3794, 642),
 (3795, 641),
 (3795, 642)}

players = {(1092, 257),
 (1092, 258),
 (1093, 257),
 (1093, 258),
 (1112, 252),
 (1112, 253),
 (1113, 252),
 (1113, 253),
 (1155, 167),
 (1155, 168),
 (1156, 167),
 (1156, 168),
 (1158, 357),
 (1158, 358),
 (1159, 357),
 (1159, 358),
 (1246, 171),
 (1246, 172),
 (1247, 171),
 (1247, 172),
 (1260, 257),
 (1260, 258),
 (1261, 257),
 (1261, 258),
 (1280, 273),
 (1280, 274),
 (1281, 273),
 (1281, 274),
 (1356, 410),
 (1356, 411),
 (1357, 410),
 (1357, 411),
 (1385, 158),
 (1385, 159),
 (1386, 158),
 (1386, 159),
 (1406, 199),
 (1406, 200),
 (1407, 199),
 (1407, 200),
 (1516, 481),
 (1516, 482),
 (1517, 481),
 (1517, 482),
 (1639, 297),
 (1639, 298),
 (1640, 297),
 (1640, 298),
 (1806, 148),
 (1806, 149),
 (1807, 148),
 (1807, 149),
 (1807, 192),
 (1807, 193),
 (1808, 192),
 (1808, 193),
 (1834, 285),
 (1834, 286),
 (1835, 285),
 (1835, 286),
 (1875, 199),
 (1875, 200),
 (1876, 199),
 (1876, 200),
 (1981, 206),
 (1981, 207),
 (1982, 206),
 (1982, 207),
 (1990, 326),
 (1990, 327),
 (1991, 326),
 (1991, 327),
 (2021, 355),
 (2021, 356),
 (2022, 355),
 (2022, 356),
 (2026, 271),
 (2026, 272),
 (2027, 271),
 (2027, 272)}
image = np.zeros((800, 3840, 3))    
destination_image = np.zeros((800, 3840, 3))

Am I missing something? Is this just something numba cannot do? Should I write it differently? Thanks!

The line profiler shows that a lot, but not the majority is being done by numpy. So there should be room for imporovement right?

javidcf · Accepted Answer

Whether or not you are using Numba, you should avoid incrementally growing an array in a loop, since that has very bad performance, you should instead preallocate an array and fill it one by one (since you may not know the exact size in advance, you can preallocate it with the largest possible, like len(pixels), and slice out the unused space at the end). However, your code can just be vectorized in a more or less straightforward manner.

import numpy as np

def dewarp_vec(image, destination_image, pixels, strength, zoom, pts, players):
    height = image.shape[0]
    width = image.shape[1]
    half_height = height / 2
    half_width = width / 2

    correctionRadius = np.sqrt(width ** 2 + height ** 2) / strength

    x_p, y_p = np.asarray(pixels).T
    newX = x_p - half_width
    newY = y_p - half_height
    distance = np.sqrt(newX ** 2 + newY ** 2)
    r = distance / correctionRadius
    theta = np.arctan(r) / r
    theta[r == 0] = 1
    sourceX = (half_width + theta * newX * zoom).astype(np.int32)
    sourceY = (half_height + theta * newY * zoom).astype(np.int32)
    m1 = (0 < sourceX) & (sourceX < width) & (0 < sourceY) & (sourceY < height)
    x_p, y_p, sourceX, sourceY = x_p[m1], y_p[m1], sourceX[m1], sourceY[m1]
    destination_image[y_p, x_p, :] = image[sourceY, sourceX, :]
    source_flat = sourceY * width + sourceX
    pts_x, pts_y = np.asarray(list(pts)).T
    pts_flat = pts_y * width + pts_x
    players_x, players_y = np.asarray(list(players)).T
    players_flat = players_y * width + players_x
    m_pts = np.isin(source_flat, pts_flat)
    m_players = np.isin(source_flat, players_flat)
    pts_transformed = np.stack([x_p[m_pts], y_p[m_pts]], axis=1)
    players_transformed = np.stack([x_p[m_players], y_p[m_players]], axis=1)
    return destination_image, pts_transformed, players_transformed

The part that is more different to your code is how to check if (sourceX, sourceY) is in pts and players. For that I computed the "flat" pixel indices and used np.isin instead (you may add assume_unique=True if you know that there will be no repeated pairs of coordinates in each input).

Numba is 10x slower than python equivalent in task which it should be good at

Answers (2)

Related Questions