Eduardo
Eduardo

Reputation: 1265

Can this be done faster with numpy?

There's a color image, a numpy array of shape (h,w,3) with N=h*w pixels; there's an array labels of shape (h,w), each label an integer between 1 and M. N is 10^6-10^7, M is 10^3-10^4.

I need to produce a result image (h,w,3) where the color of each pixel labelled l is the mean color of all pixels labelled l. I.e.:

def recolor1(image, labels):
    result = np.empty(shape=(h,w,3))
    for label in np.unique(labels):
        mask = labels==label
        mean = np.mean(image[mask], axis=0)
        result[mask] = mean
    return result

The code is straightforward, but runs in O(M.N) (the computation of mask is O(N) and the loop runs M times).

An O(N) recolor2 is possible. Basically you go over the labels and image pixels twice. First to compute an auxiliary array, indexed by label, where you keep the sums of each primary and the number of pixels for that label. Then you compute the averages for each label. Then you go over labels and pixels again, computing result. The O(M) time to find the averages is noise.

With recolor2 written in Python, recolor1 and recolor2 break even for N=1000000 and M=1000 at ~4s. As expected, recolor1's time grows linearly to ~20s for M=5000, while recolor2's remains essentially the same.

4s for a relatively small image is not great and it will get much worse for larger images. I'm no expert in numpy and associated libraries. Is there an O(N) solution there?

Upvotes: 1

Views: 65

Answers (1)

Quang Hoang
Quang Hoang

Reputation: 150735

Let's try np.bincount and loop over the channels:

result = np.stack([np.bincount(labels.flat, weights=img[...,i].flat)[labels-1] 
                   for i in range(3)],
                  axis=-1)

which takes about 35ms on my system with h,w,M = 1000,1000,1000.

Note This compute the sum, but mean should be easy enough.

Upvotes: 2

Related Questions