Reputation: 2290
I am attempting to understand the concepts behind Google PageRank, and am attempting to implement a similar (though rudimentary) version in Python. I have spent the last few hours familiarizing myself with the algorithm, however it's still not all that clear.
I've located a particularly interesting website that outlines the implementation of PageRank in Python. However, I can't quite seem to understand the purpose of all of the functions shown on this page. Could anyone clarify what exactly the functions are doing, particularly pageRankeGenerator?
Upvotes: 1
Views: 6343
Reputation: 43110
I'll try to give a simple explanation (definition) of the PageRank algorithm from my personal notes.
Let us say that pages T1, T2, ... Tn are pointing to page A, then
PR(A) = (1-d) + d * (PR(T1) / C(T1) + ... + PR(Tn) / C(Tn))
where
Every PR(x) can have start value 1 and we adjust the page ranks by repeating the algorithm ~10-20 times for each page.
Example for pages A, B, C:
A <--> B
^ /
\ v
C
Round 1
A = 0.15 + 0.85 (1/2 + 1/1) = 1.425
B = 0.15 + 0.85 (1/1) = 1
C = 0.15 + 0.85 (1/2) = 0.575
round's sum = 3
Round 2
A = 0.15 + 0.85 (1/2 + 0.575) = 1.06375
B = 0.15 + 0.85 (1.425) = 1.36125
C = 0.15 + 0.85 (1/2) = 0.575
round's sum = 3
Round 3
A = 0.15 + 0.85 (1.36125/2 + 0.575) = 1.217
B = 0.15 + 0.85 (1.06375) = 1.054
C = 0.728
round's sum = 3
...
Upvotes: 8