Reputation: 2549
Using PHP I fetched the friends list from facebook and twitter and I stored each list in an associative array. I have both their names and locations. I want to do comparison of both the friends from fb and twitter based on their name and location, and provide with a similarity score.
Like I want to set a threshold of about 0.7, and if the score is more that that for a person, then it means that they represent the same entity. I have used the php function similar_text but it is too basic, it is giving a 50 - 60 % match for almost every friend, as it is just based on the words in the name.
Any suggestions?
Upvotes: 1
Views: 335
Reputation: 126
You may want to consider the vector space model: represent each name and location as a dimension in a very high-dimensional space. Represent twitter as one vector, and facebook as another. If, for example, I have a friend named Mike on both facebook and twitter, the "Mike" dimension has a non-zero value in both vectors. By comparing the angle between these two vectors, I can compute as similarity score. A smaller angle indicates a higher degree of similarity. A simple example:
My twitter friends: Ada Alan Beth Dana Jon
My facebook friends: Anne Beth Dana Jon
Space contains dimensions: < Ada, Alan, Anne, Beth, Dana, Jon >
Twitter vector: t = < 1, 1, 0, 1, 1, 1 >
Facebook vector: f = < 0, 0, 1, 1, 1, 1 >
The angle between them is equal to ArcCos( [ f dot t ] / [ | f | * | t | ] )
See https://en.wikipedia.org/wiki/Vector_space_model
Upvotes: 1