Mariya
Mariya

Reputation: 847

Mixed variables (categorical and numerical) distance function

I want to fuzzy cluster a set of jobs. Jobs Attributes are:

  1. Categorical: position,diploma, skills
  2. Numerical : salary , years of experience

My question is: how to calculate the distance between different jobs?
e.g job1(programmer,bs computer science,(java ,.net,responsibility),1500, 3)
and job2(tester,bs computer science,(black and white box testing),1200,1)

PS: I'm beginner in data mining clustering, I highly appreciate your help.

Upvotes: 10

Views: 10108

Answers (2)

Iterator
Iterator

Reputation: 20570

Here is a good walk-through of several different clustering methods and how to use them in R: http://biocluster.ucr.edu/~tgirke/HTML_Presentations/Manuals/Clustering/clustering.pdf

In general, clustering for discrete data is related to either the use of counts (e.g. overlaps in vectors) or related to some statistic derived from counts. As much as I'd like to address the statistical side, I suppose you're interested in the algorithm, so I'll leave it at that.

Upvotes: 2

iinception
iinception

Reputation: 1955

You may take this as your starting point: http://www.econ.upf.edu/~michael/stanford/maeb4.pdf. Distance between categorical data is nicely explained at the end.

Upvotes: 3

Related Questions