Biryanii
Biryanii

Reputation: 81

Duplication in data while scraping data using Scrapy

python

I am using scrapy to scrape data from a website, where i want to scrape graphic cards title,price and whether they are in stock or not. The problem is my code is looping twice and instead of having 10 products I am getting 20.

import scrapy
    
class ThespiderSpider(scrapy.Spider):
    name = 'Thespider'
    start_urls = ['https://www.czone.com.pk/graphic-cards-pakistan-ppt.154.aspx?page=2']
    
    def parse(self, response):
        data = {}
        cards = response.css('div.row')
        for card in cards:
            for c in card.css('div.product'):
                data['Title'] =  c.css('h4 a::text').getall()
                data['Price'] =  c.css('div.price span::text').getall()
                data['Stock'] = c.css('div.product-stock span.product-data::text').getall()
                yield data

Upvotes: 0

Views: 101

Answers (1)

AaronS
AaronS

Reputation: 2335

You're doing a nested for loop when one isn't necessary.

Each card can be captured by the CSS selector response.css('div.product')

Code Example

def parse(self, response):
    data = {}
    cards = response.css('div.product')
    for card in cards:
        data['Title'] =  card.css('h4 a::text').getall()
        data['Price'] =  card.css('div.price span::text').getall()
        data['Stock'] = card.css('div.product-stock span.product-data::text').getall()
        yield data

Additional Information

  • Use get() instead of getall(). The output you get is a list, you'll probably want a string which is what get() gives you.
  • If you're thinking about multiple pages, an items dictionary may be better than yielding a dictionary. Invariably there will be the thing you need to alter and an items dictionary gives you more flexibility to do this.

Upvotes: 1

Related Questions