Reputation: 333

creating a loop for a url so that I can scrape the page for urls

thanks so much for your help. I'm trying to write a script that will scrape 589 urls and collect all of the urls on each of those 589 pages. The only thing that changes in the url is the number that follows "page="

This code isn't giving me an error message but it also isn't doing anything.

for i in xrange(589,1):
    page = urllib2.urlopen("http://www.teapartynation.com/profiles/blog/list?page={}".format(i))
    soup = BeautifulSoup(page.read())
    with io.open('TPNurls.txt', 'w', encoding='utf8') as logfile:
       for link in soup.find_all('a', 'xj_expandable'):
            linklist=(link.get('href'))
            logfile.write(linklist + u"\n")

What could the problem be? I don't know where to start without an error message. Thank you in advance.

Upvotes: 0

Answers (3)

John1024

Reputation: 113994

There were several issues but this works:

import urllib2
import io
from BeautifulSoup import BeautifulSoup
for i in xrange(1, 589):
    page = urllib2.urlopen("http://www.teapartynation.com/profiles/blog/list?page={}".format(i))
    soup = BeautifulSoup(page.read())
    with io.open('TPNurls.txt', 'w', encoding='utf8') as logfile:
       for link in soup.findAll('a', 'xj_expandable'):
            linklist=(link.get('href'))
            logfile.write(linklist + u"\n")

The xrange arguments need to be reversed.
You said there were 589 pages but note that xrange(1, 589) will only count up to 588. If there really are 589 pages, then you need to use xrange(1, 590). This is because xrange stops before the second argument is reached.
soup.find_all needs to be replaced with soup.findAll.

Upvotes: 0

Saish

Reputation: 519

The statement

xrange(589, 1)

is impossible as it means "go from 589 to 1 in increments of 1". The loop ends before it starts.

You perhaps mean:

xrange(589, 1, -1)

if you prefer to go backwards from 589 to 1 (1 is excluded).

Or:

xrange(1, 589)

if you want to go forward (589 is excluded).

From xrange help, the syntax is:

xrange(start, stop[, step])

Upvotes: 1

sundar nataraj

Reputation: 8702

your

for i in xrange(589,1)

need to be

for i in xrange(589,1,-1)

Upvotes: 0

creating a loop for a url so that I can scrape the page for urls

Answers (3)

Related Questions