Redshoe
Redshoe

Reputation: 161

div class selection using scrapy shell css selector returns empty

I am trying to scrap tshirt price from the following link : https://www.adidas.com/us/search?q=tshirt

from that link I look at the line where it says

<div class="gl-price-item gl-price-item--sale notranslate">$36</div>

This is what I did, and get

>>> fetch('https://www.adidas.com/us/search?q=tshirt')
2022-09-25 23:50:11 [scrapy.core.engine] DEBUG: Crawled (200) <GET https://www.adidas.com/us/search?q=tshirt> (referer: None)
>>> response.css('div.gl-price-item.gl-price-item--sale.notranslate')
[]

I'd expect to get at least 1 item returned from response.css('div.gl-price-item.gl-price-item--sale.notranslate') because gl-price-item.gl-price-item--sale.notranslate has an entry of $36, but I am getting a blank array. Why is this happening?

what am I doing wrong here?

Upvotes: 0

Views: 259

Answers (1)

Md. Fazlul Hoque
Md. Fazlul Hoque

Reputation: 16187

You are getting a blank array because data is loaded dynamicaly via API . So you can't grab dynamic content cause scrapy can't render JS. But you can pull all the required data from API with the help of scrapy.

Example:

import scrapy
class TestSpider(scrapy.Spider):
    name = 'test'
    def start_requests(self):
        headers= {
            'User-Agent': 'Mozilla/5.0 (Windows NT 10.0; Win64; x64) AppleWebKit/537.36 (KHTML, like Gecko) Chrome/103.0.0.0 Safari/537.36'}

        api_url='https://www.adidas.com/api/plp/content-engine/search?sitePath=us&query=tshirt'
        
        yield scrapy.Request(
            url=api_url,
            headers=headers,
            callback= self.parse,
            method="GET")


    def parse(self, response):
        resp=response.json()
        
        for item in resp['raw']['itemList']['items']:
            yield {
                'price':item['price'],
                'salePrice':item['salePrice']
                }

Output:

{'price': 35, 'salePrice': 21}
2022-09-26 11:35:58 [scrapy.core.scraper] DEBUG: Scraped from <200 https://www.adidas.com/api/plp/content-engine/search?sitePath=us&query=tshirt>
{'price': 25, 'salePrice': 23}
2022-09-26 11:35:58 [scrapy.core.scraper] DEBUG: Scraped from <200 https://www.adidas.com/api/plp/content-engine/search?sitePath=us&query=tshirt>
{'price': 35, 'salePrice': 25}
2022-09-26 11:35:58 [scrapy.core.scraper] DEBUG: Scraped from <200 https://www.adidas.com/api/plp/content-engine/search?sitePath=us&query=tshirt>
{'price': 35, 'salePrice': 25}
2022-09-26 11:35:58 [scrapy.core.scraper] DEBUG: Scraped from <200 https://www.adidas.com/api/plp/content-engine/search?sitePath=us&query=tshirt>
{'price': 35, 'salePrice': 35}
2022-09-26 11:35:58 [scrapy.core.scraper] DEBUG: Scraped from <200 https://www.adidas.com/api/plp/content-engine/search?sitePath=us&query=tshirt>
{'price': 35, 'salePrice': 35}
2022-09-26 11:35:58 [scrapy.core.scraper] DEBUG: Scraped from <200 https://www.adidas.com/api/plp/content-engine/search?sitePath=us&query=tshirt>
{'price': 25, 'salePrice': 25}
2022-09-26 11:35:58 [scrapy.core.scraper] DEBUG: Scraped from <200 https://www.adidas.com/api/plp/content-engine/search?sitePath=us&query=tshirt>
{'price': 25, 'salePrice': 23}
2022-09-26 11:35:58 [scrapy.core.scraper] DEBUG: Scraped from <200 https://www.adidas.com/api/plp/content-engine/search?sitePath=us&query=tshirt>
{'price': 45, 'salePrice': 45}
2022-09-26 11:35:58 [scrapy.core.scraper] DEBUG: Scraped from <200 https://www.adidas.com/api/plp/content-engine/search?sitePath=us&query=tshirt>
{'price': 40, 'salePrice': 40}
2022-09-26 11:35:58 [scrapy.core.scraper] DEBUG: Scraped from <200 https://www.adidas.com/api/plp/content-engine/search?sitePath=us&query=tshirt>
{'price': 150, 'salePrice': 60}
2022-09-26 11:35:58 [scrapy.core.scraper] DEBUG: Scraped from <200 https://www.adidas.com/api/plp/content-engine/search?sitePath=us&query=tshirt>
{'price': 25, 'salePrice': 25}
2022-09-26 11:35:58 [scrapy.core.scraper] DEBUG: Scraped from <200 https://www.adidas.com/api/plp/content-engine/search?sitePath=us&query=tshirt>
{'price': 40, 'salePrice': 36}
2022-09-26 11:35:58 [scrapy.core.scraper] DEBUG: Scraped from <200 https://www.adidas.com/api/plp/content-engine/search?sitePath=us&query=tshirt>
{'price': 35, 'salePrice': 35}
2022-09-26 11:35:58 [scrapy.core.scraper] DEBUG: Scraped from <200 https://www.adidas.com/api/plp/content-engine/search?sitePath=us&query=tshirt>
{'price': 25, 'salePrice': 23}
2022-09-26 11:35:58 [scrapy.core.scraper] DEBUG: Scraped from <200 https://www.adidas.com/api/plp/content-engine/search?sitePath=us&query=tshirt>
{'price': 35, 'salePrice': 21}
2022-09-26 11:35:58 [scrapy.core.scraper] DEBUG: Scraped from <200 https://www.adidas.com/api/plp/content-engine/search?sitePath=us&query=tshirt>
{'price': 32, 'salePrice': 32}
2022-09-26 11:35:58 [scrapy.core.scraper] DEBUG: Scraped from <200 https://www.adidas.com/api/plp/content-engine/search?sitePath=us&query=tshirt>
{'price': 25, 'salePrice': 10}
2022-09-26 11:35:58 [scrapy.core.scraper] DEBUG: Scraped from <200 https://www.adidas.com/api/plp/content-engine/search?sitePath=us&query=tshirt>
{'price': 35, 'salePrice': 25}
2022-09-26 11:35:58 [scrapy.core.scraper] DEBUG: Scraped from <200 https://www.adidas.com/api/plp/content-engine/search?sitePath=us&query=tshirt>
{'price': 25, 'salePrice': 25}
2022-09-26 11:35:58 [scrapy.core.scraper] DEBUG: Scraped from <200 https://www.adidas.com/api/plp/content-engine/search?sitePath=us&query=tshirt>
{'price': 55, 'salePrice': 55}
2022-09-26 11:35:58 [scrapy.core.scraper] DEBUG: Scraped from <200 https://www.adidas.com/api/plp/content-engine/search?sitePath=us&query=tshirt>
{'price': 35, 'salePrice': 35}
2022-09-26 11:35:58 [scrapy.core.scraper] DEBUG: Scraped from <200 https://www.adidas.com/api/plp/content-engine/search?sitePath=us&query=tshirt>
{'price': 30, 'salePrice': 18}
2022-09-26 11:35:58 [scrapy.core.scraper] DEBUG: Scraped from <200 https://www.adidas.com/api/plp/content-engine/search?sitePath=us&query=tshirt>
{'price': 45, 'salePrice': 45}
2022-09-26 11:35:58 [scrapy.core.scraper] DEBUG: Scraped from <200 https://www.adidas.com/api/plp/content-engine/search?sitePath=us&query=tshirt>
{'price': 35, 'salePrice': 35}
2022-09-26 11:35:58 [scrapy.core.scraper] DEBUG: Scraped from <200 https://www.adidas.com/api/plp/content-engine/search?sitePath=us&query=tshirt>
{'price': 35, 'salePrice': 21}
2022-09-26 11:35:58 [scrapy.core.scraper] DEBUG: Scraped from <200 https://www.adidas.com/api/plp/content-engine/search?sitePath=us&query=tshirt>
{'price': 25, 'salePrice': 23}
2022-09-26 11:35:58 [scrapy.core.scraper] DEBUG: Scraped from <200 https://www.adidas.com/api/plp/content-engine/search?sitePath=us&query=tshirt>
{'price': 32, 'salePrice': 32}
2022-09-26 11:35:58 [scrapy.core.scraper] DEBUG: Scraped from <200 https://www.adidas.com/api/plp/content-engine/search?sitePath=us&query=tshirt>
{'price': 40, 'salePrice': 40}
2022-09-26 11:35:58 [scrapy.core.scraper] DEBUG: Scraped from <200 https://www.adidas.com/api/plp/content-engine/search?sitePath=us&query=tshirt>
{'price': 35, 'salePrice': 35}
2022-09-26 11:35:58 [scrapy.core.scraper] DEBUG: Scraped from <200 https://www.adidas.com/api/plp/content-engine/search?sitePath=us&query=tshirt>
{'price': 25, 'salePrice': 15}
2022-09-26 11:35:58 [scrapy.core.scraper] DEBUG: Scraped from <200 https://www.adidas.com/api/plp/content-engine/search?sitePath=us&query=tshirt>
{'price': 25, 'salePrice': 25}
2022-09-26 11:35:58 [scrapy.core.scraper] DEBUG: Scraped from <200 https://www.adidas.com/api/plp/content-engine/search?sitePath=us&query=tshirt>
{'price': 30, 'salePrice': 18}
2022-09-26 11:35:58 [scrapy.core.scraper] DEBUG: Scraped from <200 https://www.adidas.com/api/plp/content-engine/search?sitePath=us&query=tshirt>
{'price': 40, 'salePrice': 40}
2022-09-26 11:35:58 [scrapy.core.scraper] DEBUG: Scraped from <200 https://www.adidas.com/api/plp/content-engine/search?sitePath=us&query=tshirt>
{'price': 25, 'salePrice': 25}
2022-09-26 11:35:58 [scrapy.core.scraper] DEBUG: Scraped from <200 https://www.adidas.com/api/plp/content-engine/search?sitePath=us&query=tshirt>
{'price': 40, 'salePrice': 40}
2022-09-26 11:35:58 [scrapy.core.scraper] DEBUG: Scraped from <200 https://www.adidas.com/api/plp/content-engine/search?sitePath=us&query=tshirt>
{'price': 110, 'salePrice': 110}
2022-09-26 11:35:58 [scrapy.core.scraper] DEBUG: Scraped from <200 https://www.adidas.com/api/plp/content-engine/search?sitePath=us&query=tshirt>
{'price': 35, 'salePrice': 35}
2022-09-26 11:35:58 [scrapy.core.scraper] DEBUG: Scraped from <200 https://www.adidas.com/api/plp/content-engine/search?sitePath=us&query=tshirt>
{'price': 22, 'salePrice': 22}
2022-09-26 11:35:58 [scrapy.core.scraper] DEBUG: Scraped from <200 https://www.adidas.com/api/plp/content-engine/search?sitePath=us&query=tshirt>
{'price': 40, 'salePrice': 40}
2022-09-26 11:35:58 [scrapy.core.scraper] DEBUG: Scraped from <200 https://www.adidas.com/api/plp/content-engine/search?sitePath=us&query=tshirt>
{'price': 35, 'salePrice': 32}
2022-09-26 11:35:58 [scrapy.core.scraper] DEBUG: Scraped from <200 https://www.adidas.com/api/plp/content-engine/search?sitePath=us&query=tshirt>
{'price': 25, 'salePrice': 23}
2022-09-26 11:35:58 [scrapy.core.scraper] DEBUG: Scraped from <200 https://www.adidas.com/api/plp/content-engine/search?sitePath=us&query=tshirt>
{'price': 25, 'salePrice': 25}
2022-09-26 11:35:58 [scrapy.core.scraper] DEBUG: Scraped from <200 https://www.adidas.com/api/plp/content-engine/search?sitePath=us&query=tshirt>
{'price': 30, 'salePrice': 30}

... so on

Upvotes: 1

Related Questions