How to check a new point is inside the exist clusters (Python)

Question

I am a bit confused about Clustering e.g. K-means clustering. I have already created clusters for the training for and in the testing part I want to know if the new points are already in the clusters or if they can be in the cluster or not? My idea is to find the center of each cluster and also find the farthest point in each cluster in training data then in testing part if the distance of the new point is great than a threshold (e.g. 1.5x the farthest point) then it cannot be in the cluster!

Is this idea efficient and correct and is there any python function to do this?

One more question: Could someone help me to understand the difference between kmeans.fit() and kmeans.predict()? I get the same result in both functions!!

I appreciate any help

Farseer · Accepted Answer

In general, when you fitting K-means algorithm, you will get cluster centers as result.

So, if you want to test to what cluster new point belong, you must calculate distance between each cluster center to the point, and label point as closest cluster center label.

If you usning scikit-learn library

Predict(X) method predicts the closest cluster each sample in X belongs to.

Fit(X) - fitting the data, or in other words calculating the cluster centers.

Here is nice example how to use K-means in scikit-learn

How to check a new point is inside the exist clusters (Python)

Answers (1)

Related Questions