Reputation: 11
I am trying to do KMeans Clusterin over multidimensional features. I get ValueError: setting an array element with a sequence.
Here is an example of what I have already tried:
import pandas as pd
from sklearn.cluster import KMeans
test = pd.DataFrame(np.random.randint(low=0, high=10, size=(30, 4)), columns=['a', 'b', 'c', 'd'])
test["combined1"] = test.loc(axis=1)["a","b"].values.tolist()
test["combined2"] = test.loc(axis=1)["c","d"].values.tolist()
test.drop(['a', 'b', 'c', 'd'],axis=1, inplace=True)
test.head()
kmeans = KMeans(n_clusters=3, random_state=0)
kmeans.fit(test)
The KMeans fit fails with
/usr/local/lib/python3.5/dist-packages/numpy/core/numeric.py in asarray(a, dtype, order)
490
491 """
--> 492 return array(a, dtype, copy=False, order=order)
493
494
ValueError: setting an array element with a sequence.
Upvotes: 1
Views: 1650
Reputation: 1009
So, you pass sequences into KMeans (like [8, 1]
) and that's why it does not work. Please check here:
that fit()
method allows you to use:
X : array-like or sparse matrix, shape=(n_samples, n_features)
Upvotes: 1