Quazi Marufur Rahman
Quazi Marufur Rahman

Reputation: 2623

How to measure semantic relationship between two webpages

Let's assume, I am visiting a University webpage. There are many teacher profile there. Though these pages are not syntactically related, these are semantically related. How can I measure this type of relationship? Actually on which parameter should I focus to find the relation?

Upvotes: 0

Views: 106

Answers (2)

Luca Mastrostefano
Luca Mastrostefano

Reputation: 3281

Here a simple but very good algorithm:

Surely the page for each teacher, and the linked pages, contain text that characterize semantically this professor. Suppose you create a set of words composed by the concatenation of the text on the page of the professor and on the linked pages (you can continue to concatenate text continuing to follow the links up to an arbitrary depth).

Now, you can clustering professors on the basis of information extracted using a vector space model: each professor is represented by a vector whose components are the words contained in the extracted pages and values ​​related term-frenquency. The cosine-similarity will do the rest of the job.

Upvotes: 0

miguelmalvarez
miguelmalvarez

Reputation: 930

This SO post answers how to compute semantic similarity between phrases. In your case you just need to represent the different pages as documents and follow the same approach.

In your case you algo can exploit more information such as the links between pages or publications (in case of researchers). I hope the link helps a bit...

Upvotes: 0

Related Questions