Reputation: 13
It's very easy to fetch the simple web page. As I can see from python's manumal
import urllib2
response = urllib2.urlopen('http://python.org/')
html = response.read()
But how to fetch all site? Can anybody please provide me the code?
Upvotes: 0
Views: 123
Reputation: 9863
You can use a combination of
You can extract links on a web page and keep track of if you've already visited that page or not and if the url belongs to the same site or not and fetch them.
You need to keep in mind the level of nesting you are going to need in order to index that page. Otherwise, the pages you are going to retrieve will grow exponentially
Upvotes: 0
Reputation: 137420
Use BeautifulSoup for parsing the site and repeat the process for every link unless it leads you outside of the domain.
Quite straightforward, but it gets complex if you try to fetch also the dynamic content, that does not have links leading to it.
Upvotes: 1