Reputation: 1332
I would like to cluster X2 and X3 for group month by using kmeans clustering. I need to cluster combined two variables. Also I would like to assign cluster 0 ,cluster 1 and cluster 2 to "strong","average","weak" according to the mean of each cluster highest means mean strong cluster. Below is my sample data set.
df=pd.DataFrame({'month':['1','1','1','1','1','2','2','2','2','2','2','2'],'X1':
[30,42,25,32,12,10,4,6,5,10,24,21],'X2':[10,76,100,23,65,94,67,24,67,54,87,81],'X3':
[23,78,95,52,60,76,68,92,34,76,34,12]})
df
I need to automate this and since then I have so many cols I would like to do this on 2 cols (df.loc[:,2:3]) in general. Assigning cluster to each def is
cluster 2="best"
cluster 1="average"
cluster 0="weak"
To find the best cluster find the mean of each column and then sum if it is higest then assign it to "best", lower to average, and lowest to "weak"
Please help thank you
Upvotes: 1
Views: 2460
Reputation: 71689
groupby
and apply
a clustering functionWe can group the dataframe by month
and cluster the columns X2
and X3
using a custom defined clustering function
cols = df.columns[2:4]
mapping = {0: 'weak', 1: 'average', 2: 'best'}
def cluster(X):
k_means = KMeans(n_clusters=3).fit(X)
return X.groupby(k_means.labels_)\
.transform('mean').sum(1)\
.rank(method='dense').sub(1)\
.astype(int).to_frame()
df['Cluster_id'] = df.groupby('month')[cols].apply(cluster)
df['Cluster_cat'] = df['Cluster_id'].map(mapping)
month X1 X2 X3 Cluster_id Cluster_cat
0 1 30 10 23 0 weak
1 1 42 76 78 1 average
2 1 25 100 95 2 best
3 1 32 23 52 0 weak
4 1 12 65 60 1 average
5 2 10 94 76 2 best
6 2 4 67 68 2 best
7 2 6 24 92 1 average
8 2 5 67 34 0 weak
9 2 10 54 76 2 best
10 2 24 87 34 0 weak
11 2 21 81 12 0 weak
Upvotes: 3