Reputation: 14318
I am trying to implement k-mean clustering algorithm for small project. I came upon this article which suggest that
K-Means is much faster if you write the update functions using operations on numpy arrays, instead of manually looping over the arrays and updating the values yourself.
I am exactly using iteration over each element of array to update it. For each element in dataset z
, I am assigning the cluster array from nearest centroid via iteration through each element.
for i in range(z):
clstr[i] = closest_center(data[i], cen)
and my update function is
def closest_center(x, clist):
dlist = [fabs(x - i) for i in clist]
return clist[dlist.index(min(dlist))]
Since I am using grayscale image, I am using absolute value to calculate the Euclidean distance.
I noticed that opencv
has this algorithm too. It takes less than 2s to execute the algorithm while mine takes more than 70s. May I know what the article is suggesting?
My images are imported as gray scale and is represented as 2d numpy array. I further converted into 1d array because it's easier to process 1d array.
Upvotes: 0
Views: 1511
Reputation: 13723
The list comprehension is likely to slow down execution. I would suggest to vectorize the function closest_center
. This is straightforward for 1-dimensional arrays:
import numpy as np
def closest_center(x, clist):
return clist[np.argmin(np.abs(x - clist))]
Upvotes: 1