Reputation: 27
I am new to python and web scraping and I am wondering if it is possible to scrape from product pages with scrapy.
Example: I search for monitors on I would like scrapy to go to each product page and scrape from there instead of just scraping the data from the search results page.
I read something about xpath but I am not sure if it is possible with that and all other resources I found seems to be doing the scraping with other things like beautiful soup etc. I correctly have a scrapy project which scrapes from a search results page but I would like to improve it to scrape from the products page.
Here's my modified based on your suggestions:
class TestSpiderSpider(scrapy.Spider):
name = 'testscraper'
page_number = 2
start_urls = ['']
def parse(self, response):
for car in response.css('.col.l3.s12.m6'):
items = scrapeItem()
product_title = response.css('.jco-card-title::text').extract()
product_link = car.css('.tooltipped valign').css('[target]::text').get()
url = response.urljoin(product_link)
yield Request(url, cb_kwargs={'product_title': product_title},callback=self.parse_car)
def parse_car(self, response, product_title):
product_description = car.css('.wysiwyg::text').get()
product_imagelink = response.css('.responsive-img img::attr(data-src)').getall()
items['product_title'] = product_title
items['product_imagelink'] = product_imagelink
yield items
He's the code for
class scrapeItem(scrapy.Item):
product_title = scrapy.Field()
product_imagelink = scrapy.Field()
There is currently and error when I try to run it. Seems to be relating to the
yield Request
Hopefully I am on the right track.
I also added the loop to the code.
Upvotes: 0
Views: 1301
Reputation: 2564
This type of question is better answered with a case in point, where you provide your code and explain what you have already tried to do.
In a general way here is how you do that:
attribute (that is the URL) of the items you want to request the product page. (This can be done with the selectors)callback
attribute)To make it more clear, here is very broad example (it doesn't really work, it's meant to illustrate):
from scrapy import Request, Spider
class ExampleSpider(Spider):
name = "example"
start_urls = ['']
def parse(self, resposne):
products = response.xpath('//div[@class="products"]')
for product in products:
product_name = product.xpath('a/text()').get()
href = product.xpath('a/@href').get()
url = response.urljoin(href) # This builds a full URL when href is a relative url
yield Request(url, cb_kwargs={'product_name': product_name}, callback=self.parse_product)
def parse_product(self, response, product_name): # Notice it will receive a new arg here, as passed in cb_kwargs
description = response.xpath('//article[@id="desc"]//text()').getall()
price = response.xpath('//div[@id="price"]/text()').get()
yield {
'product_name': product_name,
'price': price,
'description': description
Upvotes: 4