Scrapy can't crawl all pages

Question

I am trying to scrape one ecommerce page with scrapy and the code looks like this

class HugobossSpider(scrapy.Spider):
    name = 'hugoboss'
    allowed_domains = ['hugoboss.com/de/herren-schuhe/?sz=60&start=0']
    start_urls = ['https://hugoboss.com/de/herren-schuhe/?sz=60&start=0']

    def parse(self, response):
    # The main method of the spider. It scrapes the URL(s) specified in the
    # 'start_url' argument above. The content of the scraped URL is passed on
    # as the 'response' object.

        nextpageurl = response.xpath("//a[@title='Weiter']/@href")

        for item in self.scrape(response):
            yield item

        if nextpageurl:
            path = nextpageurl.extract_first()
            nextpage = response.urljoin(path)
            print("Found url: {}".format(nextpage))
            yield Request(nextpage, callback=self.parse)

    def parse(self, response):
    #Extracting the content using css selectors
        url = response.xpath('//div/@data-mouseoverimage').extract()
        product_title = response.xpath('//*[@class="product-      tile__productInfoWrapper product-tile__productInfoWrapper--is-small font__subline"]/text()').extract()
        price = response.css('.product-tile__offer .price-sales::text').getall()
    #Give the extracted content row wise
        for item in zip(url,product_title,price):
        #create a dictionary to store the scraped info
            item = {
              'URL' : item[0],
              'Product Name' : item[1].replace("
", '').replace("von", ""),
              'Price' : item[2]
            }

        #yield or give the scraped info to scrapy
            yield item

The problem is the code is extracting the information of the current page but cannot extract information for all the pages. Can somebody help?

Scrapy can't crawl all pages

Answers (1)

Related Questions

Scrapy can&#39;t crawl all pages

Answers (1)

Related Questions

Scrapy can't crawl all pages