Reputation: 131
I am attempting to create an algorithm to match people to a leader of a group. I've discovered K means clustering, and think this is the way to go. The project is in javascript so I've found a package on npm that implements K-means. Now I am confused as there aren't any examples I can find similar to this, but if I have 20 people who give scores to 4 people based on their ability to lead, how do I format my data to be used by the k-means to assign the 20 people to groups?
A screenshot google sheets of my data
To be precise: based on that screenshot I am trying to map the followers 2-20, to leaders L1-L4, based on their scores of the leaders 0,0.5,1,1.5 with 1.5 being the highest score (shortest distance). Ideally having similar sized groups.
What I've tried:
var data = [[0.5,0.5,0,0],
[1.5,0,0.5,0],
[1.5,0,1.5,1],
[1.5,0.5,0,0],
[0.5,1.5,0,1],
[0.5,1.5,0.5,1],
[0.5,0.5,1,0],
[1,0,1,1],
[1.5,1.5,1,0.5],
[0.5,1,0.5,1],
[1,1,1,1],
[1.5,1.5,0.5,1],
[1,1.5,1,0.5],
[0,1.5,0.5,1.5],
[1.5,1,0.5,0],
[0.5,0,0,1.5],
[0.5,0,0,1.5],
[1.5,0.5,1.5,1],
[0.5,1.5,1,1]];
var res = skmeans(data,4);
But this just grouped the followers amongst themselves based on who scored the leaders similarly, instead of using the leaders as centroids. Open to other clustering formats, or if I'm completely off target info on better algorithms to accomplish this task.
Upvotes: 0
Views: 231
Reputation: 978
What K-means clustering does here is to get 4 arbitrary points and calculate shortest distance to each data point to create 4 clusters as you requested. Then it will get the MEAN value of each cluster formed after the first iteration to define centroids for the next iteration. Since the first iteration takes arbitrary points, the result you got is obvious.
Defining expected leaders as centroids instead of letting the algorithm to get arbitrary points as centroids might help.
skmeans(data,k,[centroids],[iterations])
Reference: https://www.npmjs.com/package/skmeans#skmeansdatakcentroidsiterations
Upvotes: 1