Kmeans Cluster for each group in pandas dataframe and assign clusters

Question

I would like to cluster X2 and X3 for group month by using kmeans clustering. I need to cluster combined two variables. Also I would like to assign cluster 0 ,cluster 1 and cluster 2 to "strong","average","weak" according to the mean of each cluster highest means mean strong cluster. Below is my sample data set.

df=pd.DataFrame({'month':['1','1','1','1','1','2','2','2','2','2','2','2'],'X1': 
[30,42,25,32,12,10,4,6,5,10,24,21],'X2':[10,76,100,23,65,94,67,24,67,54,87,81],'X3': 
[23,78,95,52,60,76,68,92,34,76,34,12]})
df

I need to automate this and since then I have so many cols I would like to do this on 2 cols (df.loc[:,2:3]) in general. Assigning cluster to each def is

cluster 2="best"

cluster 1="average"

cluster 0="weak"

To find the best cluster find the mean of each column and then sum if it is higest then assign it to "best", lower to average, and lowest to "weak"

Please help thank you

Shubham Sharma · Accepted Answer

`groupby` and `apply` a clustering function

We can group the dataframe by month and cluster the columns X2 and X3 using a custom defined clustering function

cols = df.columns[2:4]
mapping = {0: 'weak', 1: 'average', 2: 'best'}

def cluster(X):
    k_means = KMeans(n_clusters=3).fit(X)
    return X.groupby(k_means.labels_)\
            .transform('mean').sum(1)\
            .rank(method='dense').sub(1)\
            .astype(int).to_frame()

df['Cluster_id'] = df.groupby('month')[cols].apply(cluster)
df['Cluster_cat'] = df['Cluster_id'].map(mapping)

   month  X1   X2  X3  Cluster_id Cluster_cat
0      1  30   10  23           0        weak
1      1  42   76  78           1     average
2      1  25  100  95           2        best
3      1  32   23  52           0        weak
4      1  12   65  60           1     average
5      2  10   94  76           2        best
6      2   4   67  68           2        best
7      2   6   24  92           1     average
8      2   5   67  34           0        weak
9      2  10   54  76           2        best
10     2  24   87  34           0        weak
11     2  21   81  12           0        weak

Kmeans Cluster for each group in pandas dataframe and assign clusters

Answers (1)

`groupby` and `apply` a clustering function

Related Questions

Kmeans Cluster for each group in pandas dataframe and assign clusters

Answers (1)

groupby and apply a clustering function

Related Questions

`groupby` and `apply` a clustering function