Reputation: 55
I'm looking for an algorithm that does some sort of page ranking, but gives less value to pages as they get older.
All algorithms I have seen do the opposite (give older domains more value).
Help finding such an algorithm would be much appreciated.
Edit: Looking at my initial question I think I was a bit unclear as to what I was asking, and the question is more complicated than I originally thought. Basically what I want is some sort of ranking algorithm that if Site A has linked to Site B immediately after site B has made a post, then site B's page gets extra page rank (maybe score is a better word), but if site A has linked to site B a long time after the post has been made, it adds very little to the page rank.
Hopefully this makes sense. Apologies for the initial question being wrong.
Upvotes: 4
Views: 391
Reputation: 178491
You can use biased page rank, as described by Haveliwala in this article.
The idea is simple, instead of using a regular random component: [1/n,1/n,....,1/n]
, use a biased random component, and when you take a random walk, instead of going to each page with probability 1/n, go to each page with probability f(doc)
, where f(doc) is higher for newer pages, and Sigma(f(doc)) = 1
[for all the docs in the collection, so your random component will be [f(doc1),f(doc2),...,f(docn)]
Note that for each document a must is f(doc)>0
, otherwise convergence is not guaranteed [the Perron-Frobenius theorem won't apply].
Another possibility is calculating regular page rank, and multiplying it with a different function g:Collection->R
that gives a numerical value to each page, and the newer the page is, the higher the score is for this document.
EDIT:
As response to the original question's edit:
Another possibility is when generating the graph for the web, add additional information w:E->[0,1]
, meaning: add a weight function for each edge, dentoing how important it is, If the link was made shortly after the original edit, w(e) will be closer to 1, and if it is much later, the score will be closer to 0.
When creating the matrix you calculate pagerank on, put Matrix[v1][v2] <- w((v1,v2))
, instead of a simple binary value indicating the edge exists in the graph.
Once you have this matrix, calculate PageRank normally.
Upvotes: 5