Reputation: 14515
I am trying to visualize an elbow plot for my data using YellowBrick's KElbowVisualizer and SKLearn's Expectation Maximization algorithm class: GaussianMixture.
When I run this, I get the error in the title. (I have also tried ClassificationReport, but that fails as well)
model = GaussianMixture()
data = get_data(data_name, preprocessor_name, train_split=0.75)
X, y, x_test, y_test = data
visualizer = KElbowVisualizer(model, k=(4,12))
visualizer.fit(X) # Fit the data to the visualizer
visualizer.show() # Finalize and render the figure
I cannot find anything in YellowBrick to help me estimate the number of components for expectation maximization.
Upvotes: 2
Views: 2361
Reputation: 1441
Buiding on @bbengfort's great answer, I used:
class GaussianMixtureCluster(GaussianMixture, ClusterMixin):
"""Subclass of GaussianMixture to make it a ClusterMixin."""
def fit(self, X):
super().fit(X)
self.labels_ = self.predict(X)
return self
def get_params(self, **kwargs):
output = super().get_params(**kwargs)
output["n_clusters"] = output.get("n_components", None)
return output
def set_params(self, **kwargs):
kwargs["n_components"] = kwargs.pop("n_clusters", None)
return super().set_params(**kwargs)
This lets you use any scoring metric, and works with the latest version of YellowBrick.
Upvotes: 3
Reputation: 5392
Yellowbrick uses the sklearn estimator type checks to determine if a model is well suited to the visualization. You can use the force_model
param to bypasses the type checking (though it seems that the KElbow
documentation needs to be updated with this).
However, even though force_model=True
gets you through the YellowbrickTypeError
it still does not mean that GaussianMixture
works with KElbow
. This is because the elbow visualizer is set up to work with the centroidal clustering API and requires both a n_clusters
hyperparam and a labels_
learned param. Expectation maximization models do not support this API.
However, it is possible to create a wrapper around the Gaussian mixture model that will allow it to work with the elbow visualizer (and a similar method could be used with the classification report as well).
from sklearn.base import ClusterMixin
from sklearn.mixture import GaussianMixture
from yellowbrick.cluster import KElbow
from yellowbrick.datasets import load_nfl
class GMClusters(GaussianMixture, ClusterMixin):
def __init__(self, n_clusters=1, **kwargs):
kwargs["n_components"] = n_clusters
super(GMClusters, self).__init__(**kwargs)
def fit(self, X):
super(GMClusters, self).fit(X)
self.labels_ = self.predict(X)
return self
X, _ = load_nfl()
oz = KElbow(GMClusters(), k=(4,12), force_model=True)
oz.fit(X)
oz.show()
This does produce a KElbow plot (though not a great one for this particular dataset):
Another answer mentioned Calinksi Harabasz scores, which you can use in the KElbow
visualizer as follows:
oz = KElbow(GMClusters(), k=(4,12), metric='calinski_harabasz', force_model=True)
oz.fit(X)
oz.show()
Creating the wrapper isn't ideal, but for model types that don't fit the standard classifier or clusterer sklearn APIs, they are often necessary and it's a good strategy to have in your back pocket for a number of ML tasks.
Upvotes: 13
Reputation: 391
You can use the sklearn calinski_harabasz_score
- see the relevant docs here.
scores = pd.DataFrame()
components = 100
for n in range(2,components):
model = GaussianMixture(n_components=n)
y = model.fit_predict(X)
scores.loc[n,'score'] = calinski_harabasz_score(X,y)
plt.plot(scores.reset_index()['index'],scores['score'])
Something like this should give similar functionality.
Upvotes: 2