Reputation: 847
I want to fuzzy cluster a set of jobs. Jobs Attributes are:
My question is: how to calculate the distance between different jobs?
e.g job1(programmer,bs computer science,(java ,.net,responsibility),1500, 3)
and job2(tester,bs computer science,(black and white box testing),1200,1)
PS: I'm beginner in data mining clustering, I highly appreciate your help.
Upvotes: 10
Views: 10108
Reputation: 20570
Here is a good walk-through of several different clustering methods and how to use them in R: http://biocluster.ucr.edu/~tgirke/HTML_Presentations/Manuals/Clustering/clustering.pdf
In general, clustering for discrete data is related to either the use of counts (e.g. overlaps in vectors) or related to some statistic derived from counts. As much as I'd like to address the statistical side, I suppose you're interested in the algorithm, so I'll leave it at that.
Upvotes: 2
Reputation: 1955
You may take this as your starting point: http://www.econ.upf.edu/~michael/stanford/maeb4.pdf. Distance between categorical data is nicely explained at the end.
Upvotes: 3