kamaci
kamaci

Reputation: 75127

How to Implement K-Means Clustering Algorithm for MFCC Features?

I got the features of some sound variables with MFCC Algorithm. I want to cluster them with K-Means. I have 70 frames and every frame has 9 cepstral coefficients for one voice sample. It means that I have something like a 70*9 size matrix.

Let's assume that A, B and C are the voice records so

A is:

List<List<Double>> -> 70*9 array (I can use Vector instead of List)

and also B and C has same lengths too.

I don't want to cluster each frame, I want to cluster each frame block(at my example one group has 70 frames).

How can I implement it with K-Means at Java?

Upvotes: 3

Views: 2552

Answers (2)

Has QUIT--Anony-Mousse
Has QUIT--Anony-Mousse

Reputation: 77454

K-Means has some pretty tough assumptions on your data. I'm not convinced that your data is appropriate to run k-means on it.

  1. K-means is designed for Euclidean distance, and there might be a more appropriate distance measure for your data.
  2. K-means needs to be able to compute sensible means, which may not be appropriate on your data
  3. Many distance functions (and algorithms!) don't work well at 70*9 dimensions ("curse of dimensionality")
  4. You need to know k beforehand.

Side note: keep away from Java generics for primitive type such as Double. It kills performance. Use double[][].

Upvotes: 0

Nicolas78
Nicolas78

Reputation: 5144

Here's where your knowledge of the problem domain becomes crucial. You might just use a distance between the 70*9 matrices but you can probably better. I don't know the particular features you mention, but some generic examples might be average, standard deviation of the 70 values per feature. You're basically looking to reduce the num of dimensions, both to improve speed but also to make the measure robust against sImple transformations, like offsetting all values by one step

Upvotes: 3

Related Questions