Reputation: 27
I am new to python and web scraping and I am wondering if it is possible to scrape from product pages with scrapy.
Example: I search for monitors on amazon.com I would like scrapy to go to each product page and scrape from there instead of just scraping the data from the search results page.
I read something about xpath but I am not sure if it is possible with that and all other resources I found seems to be doing the scraping with other things like beautiful soup etc. I correctly have a scrapy project which scrapes from a search results page but I would like to improve it to scrape from the products page.
Edit:
Here's my modified spider.py based on your suggestions:
class TestSpiderSpider(scrapy.Spider):
name = 'testscraper'
page_number = 2
start_urls = ['https://jamaicaclassifiedonline.com/auto/cars/']
def parse(self, response):
for car in response.css('.col.l3.s12.m6'):
items = scrapeItem()
product_title = response.css('.jco-card-title::text').extract()
product_link = car.css('.tooltipped valign').css('[target]::text').get()
url = response.urljoin(product_link)
yield Request(url, cb_kwargs={'product_title': product_title},callback=self.parse_car)
def parse_car(self, response, product_title):
product_description = car.css('.wysiwyg::text').get()
product_imagelink = response.css('.responsive-img img::attr(data-src)').getall()
items['product_title'] = product_title
items['product_imagelink'] = product_imagelink
items.append('items')
yield items
He's the code for items.py:
class scrapeItem(scrapy.Item):
product_title = scrapy.Field()
product_imagelink = scrapy.Field()
pass
There is currently and error when I try to run it. Seems to be relating to the
yield Request
Hopefully I am on the right track.
I also added the loop to the code.
Upvotes: 0
Views: 1301
Reputation: 2564
This type of question is better answered with a case in point, where you provide your code and explain what you have already tried to do.
In a general way here is how you do that:
href
attribute (that is the URL) of the items you want to request the product page. (This can be done with the selectors)callback
attribute)To make it more clear, here is very broad example (it doesn't really work, it's meant to illustrate):
from scrapy import Request, Spider
class ExampleSpider(Spider):
name = "example"
start_urls = ['https://www.example.com']
def parse(self, resposne):
products = response.xpath('//div[@class="products"]')
for product in products:
product_name = product.xpath('a/text()').get()
href = product.xpath('a/@href').get()
url = response.urljoin(href) # This builds a full URL when href is a relative url
yield Request(url, cb_kwargs={'product_name': product_name}, callback=self.parse_product)
def parse_product(self, response, product_name): # Notice it will receive a new arg here, as passed in cb_kwargs
description = response.xpath('//article[@id="desc"]//text()').getall()
price = response.xpath('//div[@id="price"]/text()').get()
yield {
'product_name': product_name,
'price': price,
'description': description
}
Upvotes: 4