Reputation: 45
I have a 2d np.array with 3 columns, coming from 4 categories of registrations. I want to implement K-means on this 3-columns np array to test if it can automatically be clustered to 4 3-dimensional good-enough clusters. So I initiate my centroids from the medians of the real categories (3 medians * 4 categories i want to cluster), and not from means because they all come from a non-parametric distribution. I scaled my data and created an np.array of medians (3*4) but i get this error:
clean=[[0.1, 0.2, 0.3],[0.1, 0.2, 0.3],[0.1, 0.2, 0.3],[0.1, 0.2, 0.3],[0.1, 0.2, 0.3],[0.1, 0.2, 0.3],[0.1, 0.2, 0.3],[0.1, 0.2, 0.3],[0.1, 0.2, 0.3],[0.1, 0.2, 0.3],[0.1, 0.2, 0.3],[0.1, 0.2, 0.3],[0.1, 0.2, 0.3],[0.1, 0.2, 0.3],[0.1, 0.2, 0.3],[0.1, 0.2, 0.3],[0.1, 0.2, 0.3],[0.1, 0.2, 0.3]]
init_medians=np.array([[0.1, 0.2, 0.3], [0.4, 0.5, 0.6], [0.7, 0.8, 0.9], [0.01, 0.02, 0.03]])
model = KMeans(n_clusters=4, max_iter=300, init=init_medians)
model.fit(clean)
TypeError: 'builtin_function_or_method' object is not subscriptable
I have tried changing the array to np array, stack etc but it seems I cannot enter 3 medians per cluster. I think K-means can cluster on 3-dimensional spaces right?
It worked when i intiated the centroids with 4 single values but this is not what I want. The error is caused by the array i input to the init= . Is there a problem on my logic or K- means knowledge or some syntax problem?
Upvotes: 1
Views: 333
Reputation: 33147
PART 1:
TypeError: 'builtin_function_or_method' object is not subscriptable
This is a pure numpy
error and it appears because you have forgotten to use parentheses () in order to define the numpy array.
PART 2:
First of all, in the init_medians
you pass 4 lists but they did not have the same dimensions. The last list has 4 elements (i.e. [0.01, 0.02, 0.03, 0.04]
) instead of 3 in order to represent the cluster medians.
Second, the KMeans's init
argument expects as input a ndarray of shape (n_clusters, n_features).
In your case, this should be a (4, 3) numpy array like the following:
init_medians=np.array( [[0.1, 0.2, 0.3], [0.4, 0.5, 0.6], [0.7, 0.8, 0.9], [0.01, 0.02, 0.03]] )
model = KMeans(n_clusters=4, max_iter=300, init=init_medians)
model.fit(clean)
PART 3: The data matrix X should be a numpy array not list of lists.
The full code:
clean=np.array([[0.1, 0.2, 0.3],[0.1, 0.2, 0.3],[0.1, 0.2, 0.3],[0.1, 0.2, 0.3],[0.1, 0.2, 0.3],[0.1, 0.2, 0.3],[0.1, 0.2, 0.3],[0.1, 0.2, 0.3],[0.1, 0.2, 0.3],[0.1, 0.2, 0.3],[0.1, 0.2, 0.3],[0.1, 0.2, 0.3],[0.1, 0.2, 0.3],[0.1, 0.2, 0.3],[0.1, 0.2, 0.3],[0.1, 0.2, 0.3],[0.1, 0.2, 0.3],[0.1, 0.2, 0.3]])
init_medians=np.array([[0.1, 0.2, 0.3], [0.4, 0.5, 0.6], [0.7, 0.8, 0.9], [0.01, 0.02, 0.03]])
model = KMeans(n_clusters=4, max_iter=300, init=init_medians)
model.fit(clean)
Upvotes: 1
Reputation: 13
Did you not simply forget to put brackets around np.array?
init_medians=np.array([...])
Upvotes: 0