user2129794
user2129794

Reputation: 2418

Handle pagination in python scrapy

I am using scrapy python to scrape a particular site. The site has pagination of the form below:

http://www.example.com/s/ref=lp_1805560031_pg_4?rh=n%3A976419031%2Cn%3A%21976420031%2Cn%3A1389401031%2Cn%3A1389432031%2Cn%3A1805560031&page=4&ie=UTF8&qid=1400668237

How can i include handle the pagination in this case if i want to scrape from say page 1 to page 30;

I tried this :

class MySpider(BaseSpider):
    start_urls = ['http://www.example.com/s/ref=lp_1805560031_pg_4?rh=n%3A976419031%2Cn%3A%21976420031%2Cn%3A1389401031%2Cn%3A1389432031%2Cn%3A1805560031&page=%s&ie=UTF8&qid=1400668237' % page for page in xrange(1,30)]

But its not working

EDIT : Am using domain as example.com just for the question purpose

Upvotes: 2

Views: 1093

Answers (1)

Krasimir
Krasimir

Reputation: 1804

This should work for you

start_urls = ['http://www.example.com/s/ref=lp_1805560031_pg_4?rh=n%3A976419031%2Cn%3A%21976420031%2Cn%3A1389401031%2Cn%3A1389432031%2Cn%3A1805560031&page={0}&ie=UTF8&qid=1400668237'.format(page) for page in xrange(1,30)]

Upvotes: 4

Related Questions