Reputation: 15
I have the following table in mysql:
Page; keywordId
page1; 1
page1; 3
page1; 4
page2; 4
page3; 1
page3; 3
page3; 4
page3; 5
I would like to create a similarity matrix containing values for each page based on their keywords like shown below. The more keywords are shared between pages, the higher the number in the matrix.
page1 page2 page3
page1 - 1 3
page2 1 - 1
page3 3 1 -
While searching for an answer, I found this entry on stackoverflow that deals with almost the same problem. However, the code is pretty dense and I could not reproduce it in order to solve my problem.
Any help would be appreciated.
Upvotes: 0
Views: 787
Reputation: 1270401
The "SQL-like" solution to what you want would look at pairs and have a separate row for each pair:
select ft1.page as page1, ft2.page as page2, count(*) as similarity
from followingtable ft1 join
followingtable ft2
on ft1.keywordid = ft2.keywordid and ft1.page <> ft2.page
group by ft1.page, ft2.page;
Pivoting this if you know the exact pages isn't hard. Pivoting this for a variable number of pages is tricky in SQL and requires using a prepared statement.
Upvotes: 2