Reputation: 3148
I'm doing the EM clustering using 3 components on a dataset (x), that is just dataframe with 15 features.
from sklearn import mixture
import pandas as pd
x=pd.read_csv('tr.csv', sep=';')
em = mixture.GMM(n_components=3)
em.fit(x)
Then I want to create an additional column in my dataframe for cluster and append to in the labels of each cluster for each variable (for example, like using labels_ in k-means approach). But the best I have are weights and it seems not very correct:
x['CLUSTER'] = pd.Series(em.weights_, index=x.index).astype(str)
It gives me an error (like there are 100000 rows in your data but you try to append only 3).
So how can I be able to use the labels of the clusters in EM algorithms and how can they be inserted in a column for each variable in a first df?
Thanks!
Upvotes: 2
Views: 475
Reputation: 66795
In order to get "labels" you need to call .predict(x)
not .weights
, .weights
are (one of many!) parameters of the fitted distribution, not point-wise labels.
x['CLUSTER'] = em.predict(x)
Upvotes: 2