Sully
Sully

Reputation: 179

Clustering of facebook-users with k-means

i got a facebook-list of user-ids from following page:

Stanford Facebook-Data

If you look at the facebook_combined data, you can see that it is a list of user-connections (edges). So for instance user 0 has something to do with user 1,2,3 and so on.

Now my work is to find clusters in the dataset.

In the first step i used node.js to read the file and save the data in an array like this:

array=[[0,1],[0,2], ...]

In the second step i used a k-means plugin for node.js to cluster the data:

Cluster-Plugin

But i dont know if the result is right, because now i get clusters of edges and not clusters of users.

UPDATE:

I am trying out a markov implementation for node js. The Markov Plugin however needs an adjacency matrix to build clusters. I implemented an algorithm with java to save the matrix in a file.

Maybe you got any other suggestion how i could get clusters out of edges.

Upvotes: 0

Views: 432

Answers (1)

Has QUIT--Anony-Mousse
Has QUIT--Anony-Mousse

Reputation: 77475

K-means assumes your input data issue an R^d vector space.

In fact, it requires the data to be this way, because it computes means as cluster centers, hence the name k-means.

So if you want to use k-means, then you need

  1. One row per datapoint (not an edge list)
  2. A fixed dimensionality data space where the mean is a useful center (usually, you should have continuous attributes, on binary data the mean does not make too much sense) and where least-squares is a meaningful optimization criterion (again, on binary data, least-squares does not have a strong theoretical support)

On your Faceboook data, you could try some embedding, but I'd have doubts about the trustworthiness.

Upvotes: 1

Related Questions