zlatko
zlatko

Reputation: 650

How to use KDE (Kernel density estimation) for one-dimensional array clustering, in scikit-learn?

I read several posts regarding 1D array clustering in which people are saying that clustering is not suitable for 1D array and that kernel density estimation should be used instead. However, nobody explained how to acctualy perform clustetring by using kde, how to retrieve cluster labels for input data?

In scikit-learn, I got kernel density estimation for my univariate (one-dimensional) data.

kde = KernelDensity(kernel='gaussian', bandwidth=0.75).fit(features)

How can I use it now for clustering, namely, how to retrieve cluster labels for input data?

I was considering two possible approaches: a) To use kde to get new, 2D input data for some clustering estimator (e.g. kmeans). I wanted to retrieve 2D array of data, in form of histogram ([value,frequency]), but I don't know how to do it from kde? Is it possible to use kde as new input dataset for a clustering algorithm, let's say for a kmeans estimator? If yes, how? How can I get a dataset from kde? b) To use kde dirrectly to calculate border between the clusters. In my particular case, I know that there are two clusters and I want to find border between them. And I need to do it computationally, not manually by looking into plot...

Upvotes: 0

Views: 2588

Answers (1)

Has QUIT--Anony-Mousse
Has QUIT--Anony-Mousse

Reputation: 77474

You don't run a clustering algorithm on a density estimate.

You want to find local minima and maxima in the density to find where to split the data.

Upvotes: 1

Related Questions