Reputation: 49
I'm currently working on a Scrapy code that will extract 3 types of data for each product. I called the data "title, price, and upc". For each product I have made my program able to scrape title and price correctly but i am having trouble scraping for upc since the upc is on another page.
What I want my program to do for each product, is to extract the title and price on the mainpage, then go inside another page to extract UPC code. Once it gets the upc code, I want the program to go to the next product on main page and repeat the same method for the remaining products.
Here is my code.
import scrapy
from scrapy.utils.response import open_in_browser
from ..items import QuotetutorialItem
data={hidden}
headers={hidden}
class BrickseekSpider(scrapy.Spider):
name = 'brickseek1'
allowed_domains = ['brickseek.com']
def start_requests(self):
dont_filter = True
yield scrapy.http.FormRequest(url='https://brickseek.com/login/', headers=headers, formdata=data,
callback=self.parse)
def parse(self, response):
items = QuotetutorialItem()
products = response.css('div.item-list__tile')
for product in products:
title = product.css('.item-list__title span::text').extract()
price = product.css('.item-list__price-column--highlighted .price-formatted__dollars::text').extract()
#another_page = response.css('div.item-list__tile a::attr(href)').get()
#if another_page:
#upc = product.css('div.item-overview__meta-item::text').extract()[6]
#yield response.follow(another_page, callback=self.parse)
items['title'] = title
items['price'] = price
#items['upc'] = upc
yield items
Upvotes: 0
Views: 87
Reputation: 1487
All you need to do is to put your item (after filling title,price) in meta when you visit the next page (assuming you css selectors are correct)
def parse(self, response):
items = QuotetutorialItem()
products = response.css('div.item-list__tile')
for product in products:
item = QuotetutorialItem()
item['title'] = product.css('.item-list__title span::text').extract()
item['price'] = product.css('.item-list__price-column--highlighted .price-formatted__dollars::text').extract()
another_page = response.css('div.item-list__tile a::attr(href)').get()
if another_page:
yield response.follow(another_page, callback=self.parse_upc,meta={'item':item})
else:
yield item
def parse_upc(self,response):
item=response.meta['item']
item['upc'] = product.css('div.item-overview__meta-item::text').extract()[6]
yield item
Upvotes: 1