Python mechanize to iterate over all webpages of a website

Question

I want to iterate over all the webpages of a website. I am trying to use mechanize here but it only looks over the main links of the website. How should I modify it?

import mechanize
import lxml.html

br = mechanize.Browser()
response = br.open("http://www.apple.com")

for link in br.links():
    print link.url
    br.follow_link(link)  # takes EITHER Link instance OR keyword args
    print br
    br.back()

This is the new code:

import mechanize
import lxml.html

links  = set()                             
visited_links  = set()


def visit(br, url):
  response = br.open(url)
  links = br.links()
  for link in links:
    if not link.url in links:
      visited_links.add(link.url)  
      visit(br, link)
      print link.url


if __name__ == '__main__':
  br = mechanize.Browser()
  visit(br,"http://www.apple.com")

Python mechanize to iterate over all webpages of a website

Answers (1)

Related Questions