Ganesh Chavan
Ganesh Chavan

Reputation: 23

crawl pages using scrapy

Im newbie to scrapy. I want to crawl products on this page. My code crawls the first page only and that too around 15 products and it stops. And want to crawl next page too. Any help please?

here is my class

class AllyouneedSpider(CrawlSpider):
   name = "allyouneed"
allowed_domains = ["de.allyouneed.com"]


start_urls = [ 'http://de.allyouneed.com/de/sportschuhe-/8799665488014/',]

rules = (
    Rule(LxmlLinkExtractor(allow=(), restrict_xpaths='//*[@class="itm fst jf-lDiv"]//a[@href]'), callback='parse_obj', process_links="parse_filter") ,
    Rule(LxmlLinkExtractor(restrict_xpaths='//*[@id="M62_searchhit"]//a[@href]')),

    )

def parse_filter(self, links):
    for link in links:
        if self.allowed_domains[0] not in link.url:
            pass  # print link.url
        # print links
    return links



def parse_obj(self, response):
    item = AllyouneedItem()
    sel = scrapy.Selector(response)
    item['url'] = []
    url = response.selector.xpath('//*[@id="M62_searchhit"]//a[@href]').extract()
    ti = response.selector.xpath('//span[@itemprop="name"]/text()').extract()
    dec = response.selector.xpath('//div[@class="m-desc m-desc-t"]//text()').extract()
    cat = response.selector.xpath('//span[@itemprop="title"]/text()').extract()

    if ti:
        item['title'] = ti
        item['url'] = response.url
        item['category'] = cat
        item['decription'] = dec
        print item
        yield item

Upvotes: 1

Views: 120

Answers (1)

Steve
Steve

Reputation: 976

Use restrict_xpaths=('//a[@class="nxtPge"]') which will find the link to the next page, there's no need to find all the links, just that one. You also don't need to filter the URLs as scrapy does that by default.

Rule(LinkExtractor(allow=(), restrict_xpaths='//a[@class="nxtPge"]', callback='parse_obj')

You can also simplify parse_obj() by removing the selector parts and not initialising item,

item = AllyouneedItem()
url = response.xpath( etc...

Upvotes: 1

Related Questions