Reputation: 333
thanks so much for your help. I'm trying to write a script that will scrape 589 urls and collect all of the urls on each of those 589 pages. The only thing that changes in the url is the number that follows "page="
This code isn't giving me an error message but it also isn't doing anything.
for i in xrange(589,1):
page = urllib2.urlopen("http://www.teapartynation.com/profiles/blog/list?page={}".format(i))
soup = BeautifulSoup(page.read())
with io.open('TPNurls.txt', 'w', encoding='utf8') as logfile:
for link in soup.find_all('a', 'xj_expandable'):
linklist=(link.get('href'))
logfile.write(linklist + u"\n")
What could the problem be? I don't know where to start without an error message. Thank you in advance.
Upvotes: 0
Views: 106
Reputation: 113994
There were several issues but this works:
import urllib2
import io
from BeautifulSoup import BeautifulSoup
for i in xrange(1, 589):
page = urllib2.urlopen("http://www.teapartynation.com/profiles/blog/list?page={}".format(i))
soup = BeautifulSoup(page.read())
with io.open('TPNurls.txt', 'w', encoding='utf8') as logfile:
for link in soup.findAll('a', 'xj_expandable'):
linklist=(link.get('href'))
logfile.write(linklist + u"\n")
The xrange
arguments need to be reversed.
You said there were 589 pages but note that xrange(1, 589)
will only count up to 588. If there really are 589 pages, then you need to use xrange(1, 590)
. This is because xrange
stops before the second argument is reached.
soup.find_all
needs to be replaced with soup.findAll
.
Upvotes: 0
Reputation: 519
The statement
xrange(589, 1)
is impossible as it means "go from 589 to 1 in increments of 1". The loop ends before it starts.
You perhaps mean:
xrange(589, 1, -1)
if you prefer to go backwards from 589 to 1 (1 is excluded).
Or:
xrange(1, 589)
if you want to go forward (589 is excluded).
From xrange
help, the syntax is:
xrange(start, stop[, step])
Upvotes: 1
Reputation: 8702
your
for i in xrange(589,1)
need to be
for i in xrange(589,1,-1)
Upvotes: 0