bytebiscuit
bytebiscuit

Reputation: 3496

page rank algo implementation in R

I'm trying to implement page rank algorithm in R using the following steps:

  1. Load a sample graph such as this one:

    0 1
    0 2
    0 3
    1 2
    1 5
    2 0
    2 4
    3 1
    3 0
    3 4
    4 1
    4 5
    5 2
    5 3
    
  2. Create an adjacency matrix out of this graph

  3. Create a Markov Chain (transition matrix)
  4. Find the stationary distribution and normalize it

The following is the code that implement all these steps:

g = read.graph(x)

a = get.adjacency(g)

markov = a / rowSums(a)

e = eigen(t(markov))

v <- e$vec[,1]

normalized <- v / sum(v)

when I compare the vector from the normalized object to the vector produced by page.rank(g) for this particular graph they are pretty much the same with minor differences. However when i try it on this graph:

    0 1
    0 2
    0 3
    1 2
    1 5
    2 0
    2 4
    3 1
    3 0
    3 4
    4 1
    4 5
    5 2
    5 3
    6 1
    6 2
    6 5
    6 0
    7 3
    7 4
    7 6
    7 7
    7 1
    8 2
    8 5
    9 8 
    9 7
    9 1 
    9 5
    10 2 
    10 3
    10 9

The difference is huge!

Anyone has an explanation for this, or an alternative implementation to this algorithm in R.

Upvotes: 1

Views: 7882

Answers (2)

vonjd
vonjd

Reputation: 4380

@Roc is incorrect in stating that you use a damping factor of 0, the opposite is true: you use a damping factor of 1.

When running the following code you get the same results for three different methods (igraph, raising matrix to n't power and eigenvector):

library(igraph)
library(expm)

set.seed(1415)
n <- 10
g <- sample_gnp(n, p = 1/4, directed = TRUE) # create random graph
df <- data.frame(pr = page_rank(g, damping = 1)$vector)

r <- c(1, rep(0, (n-1)))
adj_m <- t(as_adjacency_matrix(g, sparse = FALSE))
adj_m_mod <- prop.table(adj_m, 2)

lr <- eigen(adj_m_mod)$vectors[ , 1]
lr <- Re(lr/sum(lr))

matrix(lr, ncol = 1)
##             [,1]
##  [1,] 0.27663551
##  [2,] 0.02429907
##  [3,] 0.08878505
##  [4,] 0.06915888
##  [5,] 0.14579439
##  [6,] 0.10654206
##  [7,] 0.06915888
##  [8,] 0.07289720
##  [9,] 0.05327103
## [10,] 0.09345794
adj_m_mod %^% 100 %*% r
##             [,1]
##  [1,] 0.27663574
##  [2,] 0.02429905
##  [3,] 0.08878509
##  [4,] 0.06915881
##  [5,] 0.14579434
##  [6,] 0.10654199
##  [7,] 0.06915881
##  [8,] 0.07289723
##  [9,] 0.05327107
## [10,] 0.09345787
df
##            pr
## 1  0.27663551
## 2  0.02429907
## 3  0.08878505
## 4  0.06915888
## 5  0.14579439
## 6  0.10654206
## 7  0.06915888
## 8  0.07289720
## 9  0.05327103
## 10 0.09345794

One additional point: you always have to be careful how the adjacency matrix is defined, i.e. if incoming and outgoing links are in the rows or columns. To transform one form into the other use the transpose function t().

EDIT
I published a blog post about the pagerank algorithm in R:
http://blog.ephorie.de/googles-eigenvector-or-how-a-random-surfer-finds-the-most-relevant-webpages

Upvotes: 0

Roc
Roc

Reputation: 86

The reason is the damping parameter.

Your code is not using damping at all. beta=0. page.rank is using beta=0.85 by default.

If you use the following code, which uses damping (beta variable), you will get same results that page.rank does. Or you can modify your code with something like M=beta*M+(1-beta)*U and apply eigenvectors technique. (If some column is equals to 0 vector then you have to modify your Matrix with 1/n in this column before adding the damping effect).

I used your first example to show three different ways to get the same exact results. With no minor differences.

Your way using eigenvectors, the page.rank function and a different way using matrix iteration.

Here is the code:

g <- graph(c(
  1, 2, 1, 3, 1, 4, 
  2, 3, 2, 6, 3, 1, 
  3, 5, 4, 2, 4, 1, 
  4, 5, 5, 2, 5, 6, 
  6, 3, 6, 4), 
            directed=TRUE)

M = get.adjacency(g, sparse = FALSE)
M = t(M / rowSums(M))
n = nrow(M)

U = matrix(data=rep(1/n, n^2), nrow=n, ncol=n)
beta=0.85
A = beta*M+(1-beta)*U
e = eigen(A)
v <- e$vec[,1]
v <- as.numeric(v) / sum(as.numeric(v))
v

page.rank(g)$vector

library(expm)
n = nrow(M)
U = matrix(data=rep(1/n, n^2), nrow=n, ncol=n)
beta=0.85
A = beta*M+(1-beta)*U
r = matrix(data=rep(1/n, n), nrow=n, ncol=1)
t(A%^%100 %*% r)

Upvotes: 5

Related Questions