Suraj Patidar
Suraj Patidar

Reputation: 1

Unable to fetch data from all pages in scrapy

I am unable to fetch all pages using below code it only gives data upto page 90 and then show arribute error. I am using next button url to move to the next page. But after page 90 it is giving error that i have mentioned below.

Running this code:

import scrapy
import re

class PaginationSpider(scrapy.Spider):
    name = 'pagination'
    allowed_domains = ['www.farfetch.com']
    start_urls = ['https://www.farfetch.com/de/shopping/men/shoes-2/items.aspx?page=1']

    total_pages_pattern = r'"totalPages":(\d+)'
    current_page_pattern = r"page=(\d+)"

    def parse(self, response):
        
        number_of_pages= int(re.search(self.total_pages_pattern, str(response.body)).group(1))
        current_page = int(re.search(self.current_page_pattern, response.url).group(1))
        
        for brand in response.xpath("//h3[@itemprop='brand']//text()"):

            yield {
                "brand":brand.get()
            }

        if current_page <= number_of_pages:

            next_page = "https://www.farfetch.com/de/shopping/men/shoes-2/items.aspx?page=" + str(current_page+1)
            
            print("Current_page:" + str(current_page))

            yield response.follow(url=next_page, callback=self.parse)

Error : Error image

Upvotes: 0

Views: 146

Answers (1)

renatodvc
renatodvc

Reputation: 2564

    current_page = int(re.search(self.current_page_pattern, response.url).group(1))

re.search() method will return a Re object if the pattern matches the string. If there is no match, it will return None. So, when the pattern doesn't match, you are calling .group(1) in None.

That's why you are getting an AttributeError.

I didn't execute you code, but you can probably solve it by adding a if statement.

Upvotes: 1

Related Questions