update elements using numpy array function

Question

I am trying to implement k-mean clustering algorithm for small project. I came upon this article which suggest that

K-Means is much faster if you write the update functions using operations on numpy arrays, instead of manually looping over the arrays and updating the values yourself.

I am exactly using iteration over each element of array to update it. For each element in dataset z, I am assigning the cluster array from nearest centroid via iteration through each element.

    for i in range(z):
        clstr[i] = closest_center(data[i], cen)

and my update function is

def closest_center(x, clist):
    dlist = [fabs(x - i) for i in clist]
    return clist[dlist.index(min(dlist))]

Since I am using grayscale image, I am using absolute value to calculate the Euclidean distance.

I noticed that opencv has this algorithm too. It takes less than 2s to execute the algorithm while mine takes more than 70s. May I know what the article is suggesting?

My images are imported as gray scale and is represented as 2d numpy array. I further converted into 1d array because it's easier to process 1d array.

Tonechas · Accepted Answer

The list comprehension is likely to slow down execution. I would suggest to vectorize the function closest_center. This is straightforward for 1-dimensional arrays:

import numpy as np

def closest_center(x, clist):
    return clist[np.argmin(np.abs(x - clist))]

update elements using numpy array function

Answers (1)

Related Questions