Clustering non-numeric groups

Question

I am trying to group together parts of a data set that I am working with. I have a group of individuals that work with a variety of different skills. The idea is to get the largest pct of agents and skills represented.

So in a perfect scenario, it would be nice to get a sample of agents that comprise 85-90% of the records along with a group of skills that represent 85-90% of records too. Basically, I want to obtain the largest percent sample without having small groups of agents that work with only a few skills or have skills that only a very small pct of agents work with.

I am trying to find a more statistical approach to doing this and thought about clustering. But from my understanding, clustering requires a distance definition. I am not sure that that this data would fit this requirement.

Below is a small sample of what the data looks like:

      Agent          Skill
        1            Claims
        1            Benefits
        2            Claims
        2              -
        3            Other

Clustering non-numeric groups

Answers (1)

Related Questions