arslan
arslan

Reputation: 2224

Which clustering algorithm is suitable for this task?

I want to cluster a set of data, which is as follows:

  {[1,2],
   [2,3],
   [3,2],
   [9,8],
   [8,10],
   [7,9,8],
   [7,10,5,9]
   ...
  }

where data do not have fixed dimensions.

when K = 2, should be clustered the first 3 elements as one group and other 4 as one group.

I understand the k-means algorithm, but the problem is that its distance calculation is not suitable for my case. I use Jaccard distance for the distance of every two elements, because of various dimensions.

instead of computing means, one idea is to find the centroids of clusters. A centroid is a point which has the smallest sum of distances to all other points in a cluster.

I am working on the program according to above idea, implementing k-means++ clustering. I want a stable algorithm (output should not be extremely different in every run), should be relatively fast and must use Jaccard distance.

I am here to listen to advice because of this is my first time doing data clustering, so maybe be I am missing something. Please recommend me a suitable algorithm if there is one or point out my mistakes.

Upvotes: 0

Views: 157

Answers (1)

Has QUIT--Anony-Mousse
Has QUIT--Anony-Mousse

Reputation: 77454

Rather than k-means - which needs a fixed number of continuous valued dimensions to compute means - why don't you use the much more appropriate

Hierarchical Clustering

which can be used with Jaccard distance!

Upvotes: 1

Related Questions