Reputation: 1265
There's a color image, a numpy array of shape (h,w,3)
with N=h*w pixels; there's an array labels of shape (h,w)
, each label an integer between 1 and M. N is 10^6-10^7, M is 10^3-10^4.
I need to produce a result image (h,w,3) where the color of each pixel labelled l is the mean color of all pixels labelled l. I.e.:
def recolor1(image, labels):
result = np.empty(shape=(h,w,3))
for label in np.unique(labels):
mask = labels==label
mean = np.mean(image[mask], axis=0)
result[mask] = mean
return result
The code is straightforward, but runs in O(M.N) (the computation of mask
is O(N) and the loop runs M times).
An O(N) recolor2
is possible. Basically you go over the labels and image pixels twice. First to compute an auxiliary array, indexed by label, where you keep the sums of each primary and the number of pixels for that label. Then you compute the averages for each label. Then you go over labels and pixels again, computing result. The O(M) time to find the averages is noise.
With recolor2
written in Python, recolor1
and recolor2
break even for N=1000000 and M=1000 at ~4s. As expected, recolor1
's time grows linearly to ~20s for M=5000, while recolor2
's remains essentially the same.
4s for a relatively small image is not great and it will get much worse for larger images. I'm no expert in numpy and associated libraries. Is there an O(N) solution there?
Upvotes: 1
Views: 65
Reputation: 150735
Let's try np.bincount
and loop over the channels:
result = np.stack([np.bincount(labels.flat, weights=img[...,i].flat)[labels-1]
for i in range(3)],
axis=-1)
which takes about 35ms on my system with h,w,M = 1000,1000,1000
.
Note This compute the sum, but mean should be easy enough.
Upvotes: 2