How can pagerank iterate in mapreduce model?

Question

I am confused how pagerank algorithm work with mapreduce model.

The main confusion is that after phaseII, the val is inlinks to the key URL(not the outlinks), so how can it work in the next iteration?

See my example below:

txt:
A->B
A->C
B->A
C->B

     WORKER1              WORKER2
LOAD
      A->B                B->A  
      A->C                C->B
MAP
     (A,B)                (B,A)
     (A,C)                (C,B)
SHUFFLE AND DISTRIBUTE
     (A,[B,C])            (B,[A])
                          (C,[B])
REDUCE
     (A,(PR(A),[B,C],2))  (B,(PR(B),[A],1))
                          (C,(PR(C),[B],1))
MAP(PHASE2)
     (B,(PR(A)/2,2))      (A,(PR(B)/1,1))
     (C,(PR(A)/2,2))      (B,(PR(C)/1,1))
SHUFFLED AND DISTRIBUTE
     (A,[PR(B)/1])        (B,[PR(A)/2,PR(C)/1])
                          (C,[PR(A)/2])
RERUCE
     (A,(NEWPR(A),[B],2)) (B,(NEWPR(B),[A,C],1))
                          (C,(NEWPR(C),[A],1))

Till now, I lose the outlinks info, where is my mistake?

zsxwing · Accepted Answer

You need a structure (node_id, page_rank, adjacency_list) to store the page link, PR and the adjacency list.

A good book to train your MapReduce thought is Data-Intensive Text Processingwith MapReduce. In 5.3 PAGERANK there is a detail about how to implement PageRank in MapReduce.

How can pagerank iterate in mapreduce model?

Answers (1)

Related Questions