Reputation: 23
Im newbie to scrapy. I want to crawl products on this page. My code crawls the first page only and that too around 15 products and it stops. And want to crawl next page too. Any help please?
here is my class
class AllyouneedSpider(CrawlSpider):
name = "allyouneed"
allowed_domains = ["de.allyouneed.com"]
start_urls = [ 'http://de.allyouneed.com/de/sportschuhe-/8799665488014/',]
rules = (
Rule(LxmlLinkExtractor(allow=(), restrict_xpaths='//*[@class="itm fst jf-lDiv"]//a[@href]'), callback='parse_obj', process_links="parse_filter") ,
Rule(LxmlLinkExtractor(restrict_xpaths='//*[@id="M62_searchhit"]//a[@href]')),
)
def parse_filter(self, links):
for link in links:
if self.allowed_domains[0] not in link.url:
pass # print link.url
# print links
return links
def parse_obj(self, response):
item = AllyouneedItem()
sel = scrapy.Selector(response)
item['url'] = []
url = response.selector.xpath('//*[@id="M62_searchhit"]//a[@href]').extract()
ti = response.selector.xpath('//span[@itemprop="name"]/text()').extract()
dec = response.selector.xpath('//div[@class="m-desc m-desc-t"]//text()').extract()
cat = response.selector.xpath('//span[@itemprop="title"]/text()').extract()
if ti:
item['title'] = ti
item['url'] = response.url
item['category'] = cat
item['decription'] = dec
print item
yield item
Upvotes: 1
Views: 120
Reputation: 976
Use restrict_xpaths=('//a[@class="nxtPge"]')
which will find the link to the next page, there's no need to find all the links, just that one. You also don't need to filter the URLs as scrapy does that by default.
Rule(LinkExtractor(allow=(), restrict_xpaths='//a[@class="nxtPge"]', callback='parse_obj')
You can also simplify parse_obj() by removing the selector parts and not initialising item,
item = AllyouneedItem()
url = response.xpath( etc...
Upvotes: 1