Bogdan
Bogdan

Reputation: 13

Fetch whole site python

It's very easy to fetch the simple web page. As I can see from python's manumal

import urllib2
response = urllib2.urlopen('http://python.org/')
html = response.read()

But how to fetch all site? Can anybody please provide me the code?

Upvotes: 0

Views: 123

Answers (2)

Kartik
Kartik

Reputation: 9863

You can use a combination of

You can extract links on a web page and keep track of if you've already visited that page or not and if the url belongs to the same site or not and fetch them.

You need to keep in mind the level of nesting you are going to need in order to index that page. Otherwise, the pages you are going to retrieve will grow exponentially

Upvotes: 0

Tadeck
Tadeck

Reputation: 137420

Use BeautifulSoup for parsing the site and repeat the process for every link unless it leads you outside of the domain.

Quite straightforward, but it gets complex if you try to fetch also the dynamic content, that does not have links leading to it.

Upvotes: 1

Related Questions