Reputation:
I am trying to implement a page rank
algorithm.
I have total of 5 web pages (see image below). Following image represents a graph and shows which web page contains links to which page.
I have stored this linkage of web pages in a HashMap
in such a way that each web page's unique link is stored as key
and a HashSet
containing all the links to the web pages that a given web page points to is stored as value of that key. (see image below)
Each web page is represented by its unique link. Above mentioned HashMap
is represented in code as
HashMap<URI, HashSet<URI>> graph = new HashMap<>();
I have selected the decay
value equal to 0.85
and epsilon
equal to 0.00001
Problem
After generating the above mentioned Hashmap
, i am computing the page rank
of each web page.
Final converged page ranks should be
but my actual page ranks are
Page A = 0.3170604814815385
Page B = 0.18719407056490575
Page C = 0.13199010955519944
Page D = 0.31131469834360015
Page E = 0.05244064005475638
Actual values for each page are OK as the difference between actual and expected value for each page is less than the selected epsilon
value except for the Page D
.
I have tried different inputs in to this page rank
algorithm, no matter what i try, i always have one or two web pages for which page rank values are not correct. Algorithm is returning before all the pages have converged page ranks, i.e difference between old ranks and new ranks for each page less than epsilon
value.
Question
What am i doing wrong? Why is my page rank algorithm returning page ranks before all of them have converged?
Following function makes generates the HashMap
shown in the images above.
private static HashMap<URI, HashSet<URI>> makeGraph(HashSet<WebPage> webpages) {
HashMap<URI, HashSet<URI>> webPagesGraph = new HashMap<>();
HashSet<URI> singleWebPageLinks;
HashSet<URI> availableWebPages = new HashSet<>();
// add all the web pages available in data set in a collection
for (WebPage doc : webpages) {
availableWebPages.add(doc.getUri());
}
for (WebPage doc : webpages) {
singleWebPageLinks = new HashSet<>();
for (URI link : doc.getLinks()) {
// if link is not pointing to the web page itself and is available in data set
if (!link.equals(doc.getUri()) && availableWebPages.contains(link)) {
singleWebPageLinks.add(link);
}
}
webPagesGraph.put(doc.getUri(), singleWebPageLinks);
}
return webPagesGraph;
}
Following function computes Page Ranks
private static HashMap<URI, Double> makePageRanks(HashMap<URI, HashSet<URI>> graph,
double decay,
int limit,
double epsilon) {
// Step 1: The initialize step should go here
HashMap<URI, Double> oldPageRanks = new HashMap<>();
HashMap<URI, Double> newPageRanks = new HashMap<>();
double singleWebPageNewRank;
int numLinkedPagesBySinglePage;
double singleWebPageOldRank;
boolean haveConverged = true;
double rank;
// provide ranks to each web page
// initially the rank given to each page is 1/(total no. of web pages).
// also give new page rank to each page equal to zero
for (URI key : graph.keySet()) {
oldPageRanks.put(key, (double) 1 / graph.size());
newPageRanks.put(key, 0.0);
}
for (int i = 0; i < limit; i++) {
// Step 2: The update step should go here
for (URI uri : graph.keySet()) {
singleWebPageOldRank = oldPageRanks.get(uri);
numLinkedPagesBySinglePage = graph.get(uri).size();
// if any web page doesn't have any outgoing links to any other
// web page, increase the new page rank for every web page
if (numLinkedPagesBySinglePage == 0) {
for (URI u : newPageRanks.keySet()) {
singleWebPageNewRank = decay * (singleWebPageOldRank / graph.size());
saveNewRank(newPageRanks, u, singleWebPageNewRank);
}
} // increase the new page rank of every web page that is pointed to
// by current web page
else {
for (URI linkedWebPageURI : graph.get(uri)) {
singleWebPageNewRank = decay * (singleWebPageOldRank / numLinkedPagesBySinglePage);
saveNewRank(newPageRanks, linkedWebPageURI, singleWebPageNewRank);
}
}
}
// account for random user/surfer by adding (1 - decay) / (total no. of web pages)
// to each web page's new rank
for (URI uri : newPageRanks.keySet()) {
rank = newPageRanks.get(uri);
rank = rank + ((1 - decay) / graph.size());
newPageRanks.put(uri, rank);
// check for convergence
// check if difference b/w old rand and new rank for each web page
// is less than epsilon or not
// if difference between old and new ranks is greater than or
// equal to epsilon even for one web page, ranks haven't converged
if (oldPageRanks.get(uri) - newPageRanks.get(uri) >= epsilon) {
haveConverged = false;
}
}
if (haveConverged) {
return oldPageRanks;
} else {
haveConverged = true;
overWriteOldRanksWithNewRanks(oldPageRanks, newPageRanks);
}
}
return oldPageRanks;
}
Following two functions are utility functions that are called from within makePageRanks
function
// save the new page rank for a given web page by adding the passed new page rank to
// its previously saved page rank and then saving the new rank
private static void saveNewRank(HashMap<URI, Double> newPageRanks, URI pageURI, double pageNewRank) {
pageNewRank += newPageRanks.get(pageURI);
newPageRanks.put(pageURI, pageNewRank);
}
// overwrite old page ranks for next iteration
private static void overWriteOldRanksWithNewRanks(HashMap<URI, Double> oldRanks, HashMap<URI, Double> newRanks) {
for (URI key : newRanks.keySet()) {
oldRanks.put(key, newRanks.get(key));
// make new rank for each web page equal to zero before next iteration
newRanks.put(key, 0.0);
}
}
Following is the simple WebPage class
public class WebPage {
private ArrayList<String> words;
private URI uri;
private ArrayList<URI> links;
WebPage(URI uri, ArrayList<String> words, ArrayList<URI> links) {
this.words = words;
this.uri = uri;
this.links = links;
}
public ArrayList<String> getWords() {
return words;
}
public URI getUri() {
return uri;
}
public ArrayList<URI> getLinks() {
return links;
}
}
finally the main
method for anyone wanting to see what input i am giving to page rank algorithm
public static void main(String[] args) {
ArrayList<URI> pageALinks = new ArrayList<>();
pageALinks.add(createURI("www.b.com"));
pageALinks.add(createURI("www.d.com"));
URI pageAURI = createURI("www.a.com");
WebPage pageA = new WebPage(pageAURI, new ArrayList<>(), pageALinks);
ArrayList<URI> pageBLinks = new ArrayList<>();
pageBLinks.add(createURI("www.c.com"));
pageBLinks.add(createURI("www.d.com"));
URI pageBURI = createURI("www.b.com");
WebPage pageB = new WebPage(pageBURI, new ArrayList<>(), pageBLinks);
ArrayList<URI> pageCLinks = new ArrayList<>();
URI pageCURI = createURI("www.c.com");
WebPage pageC = new WebPage(pageCURI, new ArrayList<>(), pageCLinks);
ArrayList<URI> pageDLinks = new ArrayList<>();
pageDLinks.add(createURI("www.a.com"));
URI pageDURI = createURI("www.d.com");
WebPage pageD = new WebPage(pageDURI, new ArrayList<>(), pageDLinks);
ArrayList<URI> pageELinks = new ArrayList<>();
pageELinks.add(createURI("www.d.com"));
URI pageEURI = createURI("www.e.com");
WebPage pageE = new WebPage(pageEURI, new ArrayList<>(), pageELinks);
HashSet<WebPage> pages = new HashSet<>();
pages.add(pageA);
pages.add(pageB);
pages.add(pageC);
pages.add(pageD);
pages.add(pageE);
HashMap<URI, HashSet<URI>> graph = makeGraph(pages);
HashMap<URI, Double> map = makePageRanks(graph, 0.85, 100, 0.00001);
}
Upvotes: 0
Views: 714
Reputation: 77837
Summary:
You're testing for the wrong value. You have to reduce your code's epsilon
value to make the page rank get within 0.00001 of the desired value. Two successive guesses within 0.00001 do not imply that result.
In addition to the problems I've noted in comments, I believe I see your problem. It's a conceptual problem in convergence. It appears that the requirement of the unit test is to converge to within epsilon
of a predetermined value. You have not written an algorithm for that. Your test
if (oldPageRanks.get(uri) - newPageRanks.get(uri) >= epsilon)
Checks whether two successive approximations are within that value. This does not guarantee that the new page rank is within epsilon
of the ultimate value. The calculus / topological definition of "close" neighborhood reads something like the following, for a guess x
and a reference (correct) point z
.
abs(x - z) < delta ==> abs(f(x) - f(z)) < epsilon
You may have confused delta
and epsilon
.
If the gradient of your approximation function is outside the range [-1, +1], then you will likely trip over this mistake. You need to find the delta
value for which this holds, and then use that quantity in place of your current epsilon
. This is a simple change in the epsilon
value that you feed to your function.
Upvotes: 2