Reputation: 2067
I have a Dataset like:
0 1 2 3 4 5
Unnamed: 0 X Y Z L a b
green leaf 15.4999 20.9143 8.15938 52.8556 -23.6196 34.4027
yellow flower 38.4721 41.3847 4.41641 70.4446 -2.74272 80.3299
green leaf 8.42304 10.2697 4.58244 38.3222 -11.2275 24.0959
yellow flower 59.1535 65.6835 42.2067 84.8347 -7.73898 28.0364
I use L,a,b columns to predict cluster assignments, and get the result-y_pred
like :
[1 2 1 1 ...]
But, I'd like the below result instead -
cluster1: green leaf, green leaf, yellow flower
cluster2: yellow flower
Code I've used is:
df = np.transpose(pd.read_excel('color_xyz_lab.xlsx'))
val_all = np.array(df.values[1:,:], dtype=np.float64)
val_lab = val_all[:,3:6]
y_pred = KMeans(n_clusters= 4 , random_state=0).fit_predict(val_lab)
Upvotes: 1
Views: 402
Reputation: 79338
You could group by and then collapse:
pd.DataFrame({'a':df.index,'cluster':y_pred}).groupby('cluster').a.agg(','.join).to_dict()
{1: 'green leaf,green leaf,yellow flower', 2: 'yellow flower'}
Upvotes: 2