Reputation: 15
I am trying to scrape one ecommerce page with scrapy and the code looks like this
class HugobossSpider(scrapy.Spider):
name = 'hugoboss'
allowed_domains = ['hugoboss.com/de/herren-schuhe/?sz=60&start=0']
start_urls = ['https://hugoboss.com/de/herren-schuhe/?sz=60&start=0']
def parse(self, response):
# The main method of the spider. It scrapes the URL(s) specified in the
# 'start_url' argument above. The content of the scraped URL is passed on
# as the 'response' object.
nextpageurl = response.xpath("//a[@title='Weiter']/@href")
for item in self.scrape(response):
yield item
if nextpageurl:
path = nextpageurl.extract_first()
nextpage = response.urljoin(path)
print("Found url: {}".format(nextpage))
yield Request(nextpage, callback=self.parse)
def parse(self, response):
#Extracting the content using css selectors
url = response.xpath('//div/@data-mouseoverimage').extract()
product_title = response.xpath('//*[@class="product- tile__productInfoWrapper product-tile__productInfoWrapper--is-small font__subline"]/text()').extract()
price = response.css('.product-tile__offer .price-sales::text').getall()
#Give the extracted content row wise
for item in zip(url,product_title,price):
#create a dictionary to store the scraped info
item = {
'URL' : item[0],
'Product Name' : item[1].replace("\n", '').replace("von", ""),
'Price' : item[2]
}
#yield or give the scraped info to scrapy
yield item
The problem is the code is extracting the information of the current page but cannot extract information for all the pages. Can somebody help?
Upvotes: 0
Views: 72
Reputation: 523
You have defined twice the function def parse()
Rename the second one (maybe def extract()
) and try again.
Upvotes: 1