user2778342
user2778342

Reputation: 15

Creating a similarity matrix based on data stored in a table in mysql

I have the following table in mysql:

Page; keywordId  
page1; 1  
page1; 3  
page1; 4  
page2; 4  
page3; 1  
page3; 3  
page3; 4  
page3; 5 

I would like to create a similarity matrix containing values for each page based on their keywords like shown below. The more keywords are shared between pages, the higher the number in the matrix.

      page1   page2   page3  
page1   -       1       3  
page2   1       -       1  
page3   3       1       -  

While searching for an answer, I found this entry on stackoverflow that deals with almost the same problem. However, the code is pretty dense and I could not reproduce it in order to solve my problem.

Any help would be appreciated.

Upvotes: 0

Views: 787

Answers (1)

Gordon Linoff
Gordon Linoff

Reputation: 1270401

The "SQL-like" solution to what you want would look at pairs and have a separate row for each pair:

select ft1.page as page1, ft2.page as page2, count(*) as similarity
from followingtable ft1 join
     followingtable ft2
     on ft1.keywordid = ft2.keywordid and ft1.page <> ft2.page
group by ft1.page, ft2.page;

Pivoting this if you know the exact pages isn't hard. Pivoting this for a variable number of pages is tricky in SQL and requires using a prepared statement.

Upvotes: 2

Related Questions