Reputation: 159
I'm working on a simplified website downloader (Programming Assignment) and I have to recursively go through the links in the given url and download the individual pages to my local directory.
I already have a function to retrieve all the hyperlinks(href attributes) from a single page, Set<String> retrieveLinksOnPage(URL url)
. This function returns a vector of hyperlinks. I have been told to download pages up to level 4. (Level 0 being the Home Page) Therefore I basically want to retrieve all the links in the site but I'm having difficulty coming up with the recursion algorithm. In the end, I intend to call my function like this :
retrieveAllLinksFromSite("http://www.example.com/ldsjf.html",0)
Set<String> Links=new Set<String>();
Set<String> retrieveAllLinksFromSite (URL url, int Level,Set<String> Links)
{
if(Level==4)
return;
else{
//retrieveLinksOnPage(url,0);
//I'm pretty Lost Actually!
}
}
Thanks!
Upvotes: 1
Views: 2301
Reputation: 8969
Here is the pseudo code:
Set<String> retrieveAllLinksFromSite(int Level, Set<String> Links) {
if (Level < 5) {
Set<String> local_links = new HashSet<String>();
for (String link : Links) {
// do download link
Set<String> new_links = ;// do parsing the downloaded html of link;
local_links.addAll(retrieveAllLinksFromSite(Level+1, new_links));
}
return local_links;
} else {
return Links;
}
}
You will need to implement thing in the comments yourself. To run the function from a given single link, you need to create an initial set of links which contains only one initial link. However, it also works if you ahve multiple initial links.
Set<String> initial_link_set = new HashSet();
initial_link_set.add("http://abc.com/");
Set<String> final_link_set = retrieveAllLinksFromSite(1, initial_link_set);
Upvotes: 3
Reputation: 6860
You can use a HashMap
instead of a Vector
to store the links and their levels (since you need to recursively get all links down to level 4)
Also , it would be something like this(just giving an overall hint) :
HashMap Links=new HashMap();
void retrieveAllLinksFromSite (URL url, int Level)
{
if(Level==4)
return;
else{
retrieve the links on current page and for each retrieved link,
do {
download the link
Links.put(the retrieved url,Level) // store the link with level in hashmap
retrieveAllLinksFromSite (the retrieved url ,Level+1) //recursively call for
further levels
}
}
}
Upvotes: 0