Reputation: 117
I am trying to scrape this web page:
https://www.grohe.com/in/7780/bathroom/bathroom-faucets/essence/
I tried different ways, but every time it gives me a syntax error. I don't know much Python and Scrapy. Can anyone help me?
My requirements are:
In the header section of the page, there is a background image, some description and 2 product-related images.
In the Product Range section there are some number of images. I would like to go through all the images and scrape the individual product details.
The structure is like this:
Here is my code so far:
import scrapy
class QuotesSpider(scrapy.Spider):
name = "plumber"
start_urls = [
'https://www.grohe.com/in/7780/bathroom/bathroom-faucets/essence/',
]
def parse(self, response):
for divs in response.css('div#product-variants div.viewport div.workspace div.float-box'):
yield {
#response.css('div#product-variants a::attr(href)').extract()
'producturl': divs.css('a::attr(href)').extract(),
'imageurl': divs.css('a img::attr(src)').extract(),
'description' : divs.css('a div.text::text').extract() + divs.css('a span.nowrap::text').extract(),
next_page = producturl
next_page = response.urljoin(next_page)
yield scrapy.Request(next_page, callback=self.parse)
}
Upvotes: 0
Views: 591
Reputation: 21446
You should take next_page yield out of your item.
In general you can iterate through products, make some load and carry it over in your request's meta
parameter, like so:
def parse(self, response):
for divs in response.css('div#product-variants div.viewport div.workspace div.float-box'):
item = {'producturl': divs.css('a::attr(href)').extract(),
'imageurl': divs.css('a img::attr(src)').extract(),
'description' : divs.css('a div.text::text').extract() + divs.css('a span.nowrap::text').extract()}
next_page = response.urljoin(item['producturl'])
yield scrapy.Request(next_page, callback=self.parse_page, meta={'item': item})
def parse_page(self, response):
"""This is individual product page"""
item = response.meta['item']
item['something_new'] = 'some_value'
return item
Upvotes: 2