4daJKong
4daJKong

Reputation: 2067

How to get name clusters with labels instead of cluster numbers predicted by KMeans?

I have a Dataset like:

                                  0        1        2        3        4        5
Unnamed: 0                         X        Y        Z        L        a        b
green leaf                   15.4999  20.9143  8.15938  52.8556 -23.6196  34.4027
yellow flower                38.4721  41.3847  4.41641  70.4446 -2.74272  80.3299
green leaf                   8.42304  10.2697  4.58244  38.3222 -11.2275  24.0959
yellow flower                59.1535  65.6835  42.2067  84.8347 -7.73898  28.0364

I use L,a,b columns to predict cluster assignments, and get the result-y_pred like :

[1 2 1 1 ...]

But, I'd like the below result instead -

cluster1: green leaf, green leaf, yellow flower
cluster2: yellow flower

Code I've used is:

    df = np.transpose(pd.read_excel('color_xyz_lab.xlsx'))
    val_all = np.array(df.values[1:,:], dtype=np.float64)
    val_lab = val_all[:,3:6]
    y_pred = KMeans(n_clusters= 4 , random_state=0).fit_predict(val_lab)

Upvotes: 1

Views: 402

Answers (1)

Onyambu
Onyambu

Reputation: 79338

You could group by and then collapse:

pd.DataFrame({'a':df.index,'cluster':y_pred}).groupby('cluster').a.agg(','.join).to_dict()
{1: 'green leaf,green leaf,yellow flower', 2: 'yellow flower'}

Upvotes: 2

Related Questions