mehmet
mehmet

Reputation: 45

Manually entering medians as centroids of K-means, in Python

I have a 2d np.array with 3 columns, coming from 4 categories of registrations. I want to implement K-means on this 3-columns np array to test if it can automatically be clustered to 4 3-dimensional good-enough clusters. So I initiate my centroids from the medians of the real categories (3 medians * 4 categories i want to cluster), and not from means because they all come from a non-parametric distribution. I scaled my data and created an np.array of medians (3*4) but i get this error:

clean=[[0.1, 0.2, 0.3],[0.1, 0.2, 0.3],[0.1, 0.2, 0.3],[0.1, 0.2, 0.3],[0.1, 0.2, 0.3],[0.1, 0.2, 0.3],[0.1, 0.2, 0.3],[0.1, 0.2, 0.3],[0.1, 0.2, 0.3],[0.1, 0.2, 0.3],[0.1, 0.2, 0.3],[0.1, 0.2, 0.3],[0.1, 0.2, 0.3],[0.1, 0.2, 0.3],[0.1, 0.2, 0.3],[0.1, 0.2, 0.3],[0.1, 0.2, 0.3],[0.1, 0.2, 0.3]]

init_medians=np.array([[0.1, 0.2, 0.3], [0.4, 0.5, 0.6], [0.7, 0.8, 0.9], [0.01, 0.02, 0.03]])
model = KMeans(n_clusters=4, max_iter=300, init=init_medians)
model.fit(clean)

TypeError: 'builtin_function_or_method' object is not subscriptable

I have tried changing the array to np array, stack etc but it seems I cannot enter 3 medians per cluster. I think K-means can cluster on 3-dimensional spaces right?

It worked when i intiated the centroids with 4 single values but this is not what I want. The error is caused by the array i input to the init= . Is there a problem on my logic or K- means knowledge or some syntax problem?

Upvotes: 1

Views: 333

Answers (2)

seralouk
seralouk

Reputation: 33147

PART 1:

TypeError: 'builtin_function_or_method' object is not subscriptable

This is a pure numpy error and it appears because you have forgotten to use parentheses () in order to define the numpy array.


PART 2:

First of all, in the init_medians you pass 4 lists but they did not have the same dimensions. The last list has 4 elements (i.e. [0.01, 0.02, 0.03, 0.04]) instead of 3 in order to represent the cluster medians.

Second, the KMeans's init argument expects as input a ndarray of shape (n_clusters, n_features). In your case, this should be a (4, 3) numpy array like the following:

init_medians=np.array( [[0.1, 0.2, 0.3], [0.4, 0.5, 0.6], [0.7, 0.8, 0.9], [0.01, 0.02, 0.03]] )
model = KMeans(n_clusters=4, max_iter=300, init=init_medians)
model.fit(clean)

PART 3: The data matrix X should be a numpy array not list of lists.

The full code:

clean=np.array([[0.1, 0.2, 0.3],[0.1, 0.2, 0.3],[0.1, 0.2, 0.3],[0.1, 0.2, 0.3],[0.1, 0.2, 0.3],[0.1, 0.2, 0.3],[0.1, 0.2, 0.3],[0.1, 0.2, 0.3],[0.1, 0.2, 0.3],[0.1, 0.2, 0.3],[0.1, 0.2, 0.3],[0.1, 0.2, 0.3],[0.1, 0.2, 0.3],[0.1, 0.2, 0.3],[0.1, 0.2, 0.3],[0.1, 0.2, 0.3],[0.1, 0.2, 0.3],[0.1, 0.2, 0.3]])

init_medians=np.array([[0.1, 0.2, 0.3], [0.4, 0.5, 0.6], [0.7, 0.8, 0.9], [0.01, 0.02, 0.03]])
model = KMeans(n_clusters=4, max_iter=300, init=init_medians)
model.fit(clean)

Upvotes: 1

astrochoi
astrochoi

Reputation: 13

Did you not simply forget to put brackets around np.array?

init_medians=np.array([...])

Upvotes: 0

Related Questions