melik
melik

Reputation: 1332

Kmeans Cluster for each group in pandas dataframe and assign clusters

I would like to cluster X2 and X3 for group month by using kmeans clustering. I need to cluster combined two variables. Also I would like to assign cluster 0 ,cluster 1 and cluster 2 to "strong","average","weak" according to the mean of each cluster highest means mean strong cluster. Below is my sample data set.

df=pd.DataFrame({'month':['1','1','1','1','1','2','2','2','2','2','2','2'],'X1': 
[30,42,25,32,12,10,4,6,5,10,24,21],'X2':[10,76,100,23,65,94,67,24,67,54,87,81],'X3': 
[23,78,95,52,60,76,68,92,34,76,34,12]})
df

I need to automate this and since then I have so many cols I would like to do this on 2 cols (df.loc[:,2:3]) in general. Assigning cluster to each def is

cluster 2="best"

cluster 1="average"

cluster 0="weak"

To find the best cluster find the mean of each column and then sum if it is higest then assign it to "best", lower to average, and lowest to "weak"

Please help thank you

Upvotes: 1

Views: 2460

Answers (1)

Shubham Sharma
Shubham Sharma

Reputation: 71689

groupby and apply a clustering function

We can group the dataframe by month and cluster the columns X2 and X3 using a custom defined clustering function

cols = df.columns[2:4]
mapping = {0: 'weak', 1: 'average', 2: 'best'}

def cluster(X):
    k_means = KMeans(n_clusters=3).fit(X)
    return X.groupby(k_means.labels_)\
            .transform('mean').sum(1)\
            .rank(method='dense').sub(1)\
            .astype(int).to_frame()

df['Cluster_id'] = df.groupby('month')[cols].apply(cluster)
df['Cluster_cat'] = df['Cluster_id'].map(mapping)

   month  X1   X2  X3  Cluster_id Cluster_cat
0      1  30   10  23           0        weak
1      1  42   76  78           1     average
2      1  25  100  95           2        best
3      1  32   23  52           0        weak
4      1  12   65  60           1     average
5      2  10   94  76           2        best
6      2   4   67  68           2        best
7      2   6   24  92           1     average
8      2   5   67  34           0        weak
9      2  10   54  76           2        best
10     2  24   87  34           0        weak
11     2  21   81  12           0        weak

Upvotes: 3

Related Questions