Hell_ Raisin
Hell_ Raisin

Reputation: 49

How Do I Set Up Pagination correctly?

I'm currently working on a Scrapy code that will extract 3 types of data for each product. I called the data "title, price, and upc". For each product I have made my program able to scrape title and price correctly but i am having trouble scraping for upc since the upc is on another page.

What I want my program to do for each product, is to extract the title and price on the mainpage, then go inside another page to extract UPC code. Once it gets the upc code, I want the program to go to the next product on main page and repeat the same method for the remaining products.

Here is my code.

import scrapy
from scrapy.utils.response import open_in_browser
from ..items import QuotetutorialItem

data={hidden}
headers={hidden}

class BrickseekSpider(scrapy.Spider):
    name = 'brickseek1'
    allowed_domains = ['brickseek.com']

    def start_requests(self):
        dont_filter = True
        yield scrapy.http.FormRequest(url='https://brickseek.com/login/', headers=headers, formdata=data,
                                      callback=self.parse)

    def parse(self, response):
        items = QuotetutorialItem()
        products = response.css('div.item-list__tile')

        for product in products:
            title = product.css('.item-list__title span::text').extract()
            price = product.css('.item-list__price-column--highlighted .price-formatted__dollars::text').extract()

        #another_page = response.css('div.item-list__tile a::attr(href)').get()
        #if another_page:
            #upc = product.css('div.item-overview__meta-item::text').extract()[6]
            #yield response.follow(another_page, callback=self.parse)

        items['title'] = title
        items['price'] = price
        #items['upc'] = upc

        yield items

Upvotes: 0

Views: 87

Answers (1)

wishmaster
wishmaster

Reputation: 1487

All you need to do is to put your item (after filling title,price) in meta when you visit the next page (assuming you css selectors are correct)

def parse(self, response):
    items = QuotetutorialItem()
    products = response.css('div.item-list__tile')

    for product in products:
        item = QuotetutorialItem()
        item['title'] = product.css('.item-list__title span::text').extract()
        item['price'] = product.css('.item-list__price-column--highlighted .price-formatted__dollars::text').extract()
        another_page = response.css('div.item-list__tile a::attr(href)').get()
        if another_page:
            yield response.follow(another_page, callback=self.parse_upc,meta={'item':item})
        else:
            yield item

def parse_upc(self,response):
    item=response.meta['item']
    item['upc'] = product.css('div.item-overview__meta-item::text').extract()[6]
    yield item

Upvotes: 1

Related Questions