Reputation: 3123
Anyone got an idea on how a simple K-means algorithm could be tuned to handle data sets of this form.
Upvotes: 0
Views: 327
Reputation: 6514
The most direct way to handle data of that form while still using k-means it to use a kernelized version of k-means. 2 implemtations of it exist in the JSAT library (see here https://github.com/EdwardRaff/JSAT/blob/67fe66db3955da9f4192bb8f7823d2aa6662fc6f/JSAT/src/jsat/clustering/kmeans/ElkanKernelKMeans.java)
As Nicholas said, another option is to create a new feature space on which you run k-means. However this takes some prior knowledge of what kind of data you will be clustering.
After that, you really just need to move to a different algorithm. k-means is a simple algorithm that makes simple assumptions about the world, and when those assumptions are too strongly violated (non linearly separable clusters being one of those assumptions) then you just have to accept that and pick a more appropriate algorithm.
Upvotes: 1
Reputation: 1793
One possible solution to this problem is to add another dimension to your data set, for which there is a split between the two classes.
Obviously this is not applicable in many cases, but if you have applied some sort of dimensionality reduction to your data, then it may be something worth investigating.
Upvotes: 1