ReginaldJ
ReginaldJ

Reputation: 33

How can I iterate through the pages of a website using Python?

I'm new to software development, and I'm not sure how to go about this. I want to visit every page of a website and grab a specific bit of data from each one. My problem is, I don't know how to iterate through all of the existing pages without knowing the individual urls ahead of time. For example, I want to visit every page whose url starts with

"http://stackoverflow.com/questions/"

Is there a way to compile a list and then iterate through that, or is it possible to do this without creating a giant list of urls?

Upvotes: 0

Views: 12795

Answers (3)

jfs
jfs

Reputation: 414179

To grab a specific bit of data from a web site you could use some web scraping tool e.g., scrapy.

If required data is generated by javascript then you might need browser-like tool such as Selenium WebDriver and implement crawling of the links by hand.

Upvotes: 0

Blender
Blender

Reputation: 298166

Try Scrapy.

It handles all of the crawling for you and lets you focus on processing the data, not extracting it. Instead of copy-pasting the code already in the tutorial, I'll leave it to you to read it.

Upvotes: 5

mega.venik
mega.venik

Reputation: 658

For example, you can make a simple for loop, like this:

def webIterate():
    base_link = "http://stackoverflow.com/questions/"
    for i in xrange(24):
        print "http://stackoverflow.com/questions/%d" % (i)

The output will be:

http://stackoverflow.com/questions/0
http://stackoverflow.com/questions/2
http://stackoverflow.com/questions/3
...
http://stackoverflow.com/questions/23

It's just an example. You can pass numbers of questions and make with them whatever you want

Upvotes: -2

Related Questions