Reputation: 1265
I have a dataset with 6 columns and after using KMEANs I need to visualize the plot after clustering. I have six clusters. how can I do it? this my Kmeans clustering code:
from sklearn.preprocessing import StandardScaler
scaler = StandardScaler()
scaled_features = scaler.fit_transform(psnr_bitrate)
kmeans = KMeans(init="random",n_clusters=6,n_init=10,max_iter=300,random_state=42)
kmeans.fit(scaled_features)
y_kmeans = kmeans.predict(scaled_features)
I found another post on this link: How to visualize kmeans clustering on multidimensional data but I could not understand the solution, because I do not know what is
cluster
in that code?!
I used the following code:
from sklearn.preprocessing import StandardScaler
from sklearn import cluster
scaler = StandardScaler()
scaled_features = scaler.fit_transform(psnr_bitrate)
kmeans = KMeans(init="random",n_clusters=6,n_init=10,max_iter=300,random_state=42)
kmeans.fit(scaled_features)
y_kmeans = kmeans.predict(scaled_features)
scaled_features['cluster'] = y_kmeans
pd.tools.plotting.parallel_coordinates(scaled_features, 'cluster')
and it produces this error:
Traceback (most recent call last):
File "<ipython-input-77-2e66d8a57100>", line 7, in <module>
scaled_features['cluster'] = y_kmeans
IndexError: only integers, slices (`:`), ellipsis (`...`), numpy.newaxis (`None`) and integer or boolean arrays are valid indices
my input data for clustering is a numpy variable like this:
31.764833 35.632833 38.088500 39.877250 41.331917 42.923750
29.832750 34.567500 37.527417 39.621000 41.412583 43.023917
36.777167 41.151333 44.122500 46.237167 47.879083 49.832250
46.871500 52.006333 54.784583 57.099417 58.767833 60.674667
it has 6 columns and 1301 rows. but my columns does not have name.
Upvotes: 1
Views: 4827
Reputation: 46888
A few points, it should be pd.plotting.parallel_coordinates
for later versions of pandas , and it is easier if you make your predictors a data frame, for example:
import pandas as pd
import numpy as np
from sklearn.preprocessing import StandardScaler
from sklearn.cluster import KMeans
from sklearn import datasets
from sklearn.decomposition import PCA
# import some data to play with
X = iris.data
y = iris.target
scaler = StandardScaler()
scaled_features = pd.DataFrame(scaler.fit_transform(X))
If you can, give column names:
scaled_features.columns = iris.feature_names
Kmeans and assign cluster:
kmeans = KMeans(init="random",n_clusters=6,n_init=10,max_iter=300,random_state=42)
kmeans.fit(scaled_features)
scaled_features['cluster'] = kmeans.predict(scaled_features)
Plot:
pd.plotting.parallel_coordinates(scaled_features, 'cluster')
Or do some dimension reduction on your features and plot:
from sklearn.manifold import MDS
import seaborn as sns
embedding = MDS(n_components=2)
mds = pd.DataFrame(embedding.fit_transform(scaled_features.drop('cluster',axis=1)),
columns = ['component1','component2'])
mds['cluster'] = kmeans.predict(scaled_features.drop('cluster',axis=1))
sns.scatterplot(data=mds,x = "component1",y="component2",hue="cluster")
Upvotes: 2
Reputation: 2402
scaled_features
is a numpy array, you cannot index an array with a string. You need to convert it first to a dataframe with this:
scaled_features = pd.DataFrame(scaled_features)
Upvotes: 0