Reputation: 406
I have the following function:
def dewarp(image, destination_image, pixels, strength, zoom, pts, players):
height = image.shape[0]
width = image.shape[1]
half_height = height / 2
half_width = width / 2
pts_transformed = np.empty((0, 2))
players_transformed = np.empty((0, 2))
correctionRadius = sqrt(width ** 2 + height ** 2) / strength
for x_p, y_p in pixels:
newX = x_p - half_width
newY = y_p - half_height
distance = sqrt(newX ** 2 + newY ** 2)
r = distance / correctionRadius
if r == 0:
theta = 1
else:
theta = atan(r) / r
sourceX = int(half_width + theta * newX * zoom)
sourceY = int(half_height + theta * newY * zoom)
if 0 < sourceX < width and 0 < sourceY < height:
destination_image[y_p, x_p, :] = image[sourceY, sourceX, :]
if (sourceX, sourceY) in pts:
pts_transformed = np.vstack((pts_transformed, np.array([[x_p, y_p]])))
if (sourceX, sourceY) in players:
players_transformed = np.vstack((players_transformed, np.array([[x_p, y_p]])))
return destination_image, pts_transformed, players_transformed
The arguments are: image and destination image: both a 3840x800x3 numpy array pixels are a list of pixel combinations, I've tried a double for loop too, but the result is the same strength and zoom are both floats pts and players are both python sets
The pure python version of this takes about 4 seconds, the numba version usually about 30 seconds. How is this possible?
I've used dewarp.inspect_types and numba appears to not be in object mode.
For convenience if you'd like to recreate the example, you can use this as image, destination image, pts and players and check for yourself:
pts = {(70, 667),
(70, 668),
(71, 667),
(71, 668),
(1169, 94),
(1169, 95),
(1170, 94),
(1170, 95),
(2699, 86),
(2699, 87),
(2700, 86),
(2700, 87),
(3794, 641),
(3794, 642),
(3795, 641),
(3795, 642)}
players = {(1092, 257),
(1092, 258),
(1093, 257),
(1093, 258),
(1112, 252),
(1112, 253),
(1113, 252),
(1113, 253),
(1155, 167),
(1155, 168),
(1156, 167),
(1156, 168),
(1158, 357),
(1158, 358),
(1159, 357),
(1159, 358),
(1246, 171),
(1246, 172),
(1247, 171),
(1247, 172),
(1260, 257),
(1260, 258),
(1261, 257),
(1261, 258),
(1280, 273),
(1280, 274),
(1281, 273),
(1281, 274),
(1356, 410),
(1356, 411),
(1357, 410),
(1357, 411),
(1385, 158),
(1385, 159),
(1386, 158),
(1386, 159),
(1406, 199),
(1406, 200),
(1407, 199),
(1407, 200),
(1516, 481),
(1516, 482),
(1517, 481),
(1517, 482),
(1639, 297),
(1639, 298),
(1640, 297),
(1640, 298),
(1806, 148),
(1806, 149),
(1807, 148),
(1807, 149),
(1807, 192),
(1807, 193),
(1808, 192),
(1808, 193),
(1834, 285),
(1834, 286),
(1835, 285),
(1835, 286),
(1875, 199),
(1875, 200),
(1876, 199),
(1876, 200),
(1981, 206),
(1981, 207),
(1982, 206),
(1982, 207),
(1990, 326),
(1990, 327),
(1991, 326),
(1991, 327),
(2021, 355),
(2021, 356),
(2022, 355),
(2022, 356),
(2026, 271),
(2026, 272),
(2027, 271),
(2027, 272)}
image = np.zeros((800, 3840, 3))
destination_image = np.zeros((800, 3840, 3))
Am I missing something? Is this just something numba cannot do? Should I write it differently? Thanks!
The line profiler shows that a lot, but not the majority is being done by numpy. So there should be room for imporovement right?
Upvotes: 0
Views: 322
Reputation: 59701
Whether or not you are using Numba, you should avoid incrementally growing an array in a loop, since that has very bad performance, you should instead preallocate an array and fill it one by one (since you may not know the exact size in advance, you can preallocate it with the largest possible, like len(pixels)
, and slice out the unused space at the end). However, your code can just be vectorized in a more or less straightforward manner.
import numpy as np
def dewarp_vec(image, destination_image, pixels, strength, zoom, pts, players):
height = image.shape[0]
width = image.shape[1]
half_height = height / 2
half_width = width / 2
correctionRadius = np.sqrt(width ** 2 + height ** 2) / strength
x_p, y_p = np.asarray(pixels).T
newX = x_p - half_width
newY = y_p - half_height
distance = np.sqrt(newX ** 2 + newY ** 2)
r = distance / correctionRadius
theta = np.arctan(r) / r
theta[r == 0] = 1
sourceX = (half_width + theta * newX * zoom).astype(np.int32)
sourceY = (half_height + theta * newY * zoom).astype(np.int32)
m1 = (0 < sourceX) & (sourceX < width) & (0 < sourceY) & (sourceY < height)
x_p, y_p, sourceX, sourceY = x_p[m1], y_p[m1], sourceX[m1], sourceY[m1]
destination_image[y_p, x_p, :] = image[sourceY, sourceX, :]
source_flat = sourceY * width + sourceX
pts_x, pts_y = np.asarray(list(pts)).T
pts_flat = pts_y * width + pts_x
players_x, players_y = np.asarray(list(players)).T
players_flat = players_y * width + players_x
m_pts = np.isin(source_flat, pts_flat)
m_players = np.isin(source_flat, players_flat)
pts_transformed = np.stack([x_p[m_pts], y_p[m_pts]], axis=1)
players_transformed = np.stack([x_p[m_players], y_p[m_players]], axis=1)
return destination_image, pts_transformed, players_transformed
The part that is more different to your code is how to check if (sourceX, sourceY)
is in pts
and players
. For that I computed the "flat" pixel indices and used np.isin
instead (you may add assume_unique=True
if you know that there will be no repeated pairs of coordinates in each input).
Upvotes: 2
Reputation: 40703
I don't see why this algorithm would see any significant benefit from using numba. All the heaving lifting appears to be in the image copying and np.vstack
sections. That's all in numpy, so numba won't help there. The way you iteratively uses vstack
also has terrible performance. You'd do better to build a list of sub-arrays and the stack them together all in one go at the end.
As to what the problem is, what does
dewarp.inspect_types()
output? It should show you where numba needs to interface with Python. If this is done anywhere in the loop then performance will suffer if your program is multi-threaded.
Upvotes: 1