AWRAM
AWRAM

Reputation: 333

Fuzzy c- means categorical data

Can the fuzzy c-means applied on non numerical data sets ? i.e categorical or mixed numerical and categorical.. if yes (I hope so :( ):

If NO , what is the alternative .. how to fuzzy clusters these data ?

I need the response please help

NOTE: I've used the Jacard's coefficient to calculate the distance between 2 points but still didn't get the way to calculate the cluster centers see the attachementsenter image description here jacard coefficient

Upvotes: 2

Views: 1826

Answers (1)

Fred Foo
Fred Foo

Reputation: 363597

You'll have to transform your data into a numeric form. There are various ways of doing that, two of them being:

  • use vectors of feature counts (common in, e.g., text categorization)
  • use a one-hot representation, where a categorical feature that can take on n distinct values is represented as string of n bits, with only the i'th bit set if a feature has the i'th value in its allowed range.

Both are very common transformations that many machine learning programs do under the hood. Also, you might want to experiment with a different metric than the Euclidean one. Esp. with one-hot representation, but depending on the data, the L1 norm (Manhattan/city block distance) may be more appropriate.

Apart from that, just apply the given formulas to your transformed dataset.

Upvotes: 4

Related Questions