Reputation: 161
I am trying to scrap tshirt price from the following link : https://www.adidas.com/us/search?q=tshirt
from that link I look at the line where it says
<div class="gl-price-item gl-price-item--sale notranslate">$36</div>
This is what I did, and get
>>> fetch('https://www.adidas.com/us/search?q=tshirt')
2022-09-25 23:50:11 [scrapy.core.engine] DEBUG: Crawled (200) <GET https://www.adidas.com/us/search?q=tshirt> (referer: None)
>>> response.css('div.gl-price-item.gl-price-item--sale.notranslate')
[]
I'd expect to get at least 1 item returned from response.css('div.gl-price-item.gl-price-item--sale.notranslate')
because gl-price-item.gl-price-item--sale.notranslate
has an entry of $36
, but I am getting a blank array. Why is this happening?
what am I doing wrong here?
Upvotes: 0
Views: 259
Reputation: 16187
You are getting a blank array because data is loaded dynamicaly via API
. So you can't grab dynamic content cause scrapy can't render JS. But you can pull all the required data from API with the help of scrapy.
Example:
import scrapy
class TestSpider(scrapy.Spider):
name = 'test'
def start_requests(self):
headers= {
'User-Agent': 'Mozilla/5.0 (Windows NT 10.0; Win64; x64) AppleWebKit/537.36 (KHTML, like Gecko) Chrome/103.0.0.0 Safari/537.36'}
api_url='https://www.adidas.com/api/plp/content-engine/search?sitePath=us&query=tshirt'
yield scrapy.Request(
url=api_url,
headers=headers,
callback= self.parse,
method="GET")
def parse(self, response):
resp=response.json()
for item in resp['raw']['itemList']['items']:
yield {
'price':item['price'],
'salePrice':item['salePrice']
}
Output:
{'price': 35, 'salePrice': 21}
2022-09-26 11:35:58 [scrapy.core.scraper] DEBUG: Scraped from <200 https://www.adidas.com/api/plp/content-engine/search?sitePath=us&query=tshirt>
{'price': 25, 'salePrice': 23}
2022-09-26 11:35:58 [scrapy.core.scraper] DEBUG: Scraped from <200 https://www.adidas.com/api/plp/content-engine/search?sitePath=us&query=tshirt>
{'price': 35, 'salePrice': 25}
2022-09-26 11:35:58 [scrapy.core.scraper] DEBUG: Scraped from <200 https://www.adidas.com/api/plp/content-engine/search?sitePath=us&query=tshirt>
{'price': 35, 'salePrice': 25}
2022-09-26 11:35:58 [scrapy.core.scraper] DEBUG: Scraped from <200 https://www.adidas.com/api/plp/content-engine/search?sitePath=us&query=tshirt>
{'price': 35, 'salePrice': 35}
2022-09-26 11:35:58 [scrapy.core.scraper] DEBUG: Scraped from <200 https://www.adidas.com/api/plp/content-engine/search?sitePath=us&query=tshirt>
{'price': 35, 'salePrice': 35}
2022-09-26 11:35:58 [scrapy.core.scraper] DEBUG: Scraped from <200 https://www.adidas.com/api/plp/content-engine/search?sitePath=us&query=tshirt>
{'price': 25, 'salePrice': 25}
2022-09-26 11:35:58 [scrapy.core.scraper] DEBUG: Scraped from <200 https://www.adidas.com/api/plp/content-engine/search?sitePath=us&query=tshirt>
{'price': 25, 'salePrice': 23}
2022-09-26 11:35:58 [scrapy.core.scraper] DEBUG: Scraped from <200 https://www.adidas.com/api/plp/content-engine/search?sitePath=us&query=tshirt>
{'price': 45, 'salePrice': 45}
2022-09-26 11:35:58 [scrapy.core.scraper] DEBUG: Scraped from <200 https://www.adidas.com/api/plp/content-engine/search?sitePath=us&query=tshirt>
{'price': 40, 'salePrice': 40}
2022-09-26 11:35:58 [scrapy.core.scraper] DEBUG: Scraped from <200 https://www.adidas.com/api/plp/content-engine/search?sitePath=us&query=tshirt>
{'price': 150, 'salePrice': 60}
2022-09-26 11:35:58 [scrapy.core.scraper] DEBUG: Scraped from <200 https://www.adidas.com/api/plp/content-engine/search?sitePath=us&query=tshirt>
{'price': 25, 'salePrice': 25}
2022-09-26 11:35:58 [scrapy.core.scraper] DEBUG: Scraped from <200 https://www.adidas.com/api/plp/content-engine/search?sitePath=us&query=tshirt>
{'price': 40, 'salePrice': 36}
2022-09-26 11:35:58 [scrapy.core.scraper] DEBUG: Scraped from <200 https://www.adidas.com/api/plp/content-engine/search?sitePath=us&query=tshirt>
{'price': 35, 'salePrice': 35}
2022-09-26 11:35:58 [scrapy.core.scraper] DEBUG: Scraped from <200 https://www.adidas.com/api/plp/content-engine/search?sitePath=us&query=tshirt>
{'price': 25, 'salePrice': 23}
2022-09-26 11:35:58 [scrapy.core.scraper] DEBUG: Scraped from <200 https://www.adidas.com/api/plp/content-engine/search?sitePath=us&query=tshirt>
{'price': 35, 'salePrice': 21}
2022-09-26 11:35:58 [scrapy.core.scraper] DEBUG: Scraped from <200 https://www.adidas.com/api/plp/content-engine/search?sitePath=us&query=tshirt>
{'price': 32, 'salePrice': 32}
2022-09-26 11:35:58 [scrapy.core.scraper] DEBUG: Scraped from <200 https://www.adidas.com/api/plp/content-engine/search?sitePath=us&query=tshirt>
{'price': 25, 'salePrice': 10}
2022-09-26 11:35:58 [scrapy.core.scraper] DEBUG: Scraped from <200 https://www.adidas.com/api/plp/content-engine/search?sitePath=us&query=tshirt>
{'price': 35, 'salePrice': 25}
2022-09-26 11:35:58 [scrapy.core.scraper] DEBUG: Scraped from <200 https://www.adidas.com/api/plp/content-engine/search?sitePath=us&query=tshirt>
{'price': 25, 'salePrice': 25}
2022-09-26 11:35:58 [scrapy.core.scraper] DEBUG: Scraped from <200 https://www.adidas.com/api/plp/content-engine/search?sitePath=us&query=tshirt>
{'price': 55, 'salePrice': 55}
2022-09-26 11:35:58 [scrapy.core.scraper] DEBUG: Scraped from <200 https://www.adidas.com/api/plp/content-engine/search?sitePath=us&query=tshirt>
{'price': 35, 'salePrice': 35}
2022-09-26 11:35:58 [scrapy.core.scraper] DEBUG: Scraped from <200 https://www.adidas.com/api/plp/content-engine/search?sitePath=us&query=tshirt>
{'price': 30, 'salePrice': 18}
2022-09-26 11:35:58 [scrapy.core.scraper] DEBUG: Scraped from <200 https://www.adidas.com/api/plp/content-engine/search?sitePath=us&query=tshirt>
{'price': 45, 'salePrice': 45}
2022-09-26 11:35:58 [scrapy.core.scraper] DEBUG: Scraped from <200 https://www.adidas.com/api/plp/content-engine/search?sitePath=us&query=tshirt>
{'price': 35, 'salePrice': 35}
2022-09-26 11:35:58 [scrapy.core.scraper] DEBUG: Scraped from <200 https://www.adidas.com/api/plp/content-engine/search?sitePath=us&query=tshirt>
{'price': 35, 'salePrice': 21}
2022-09-26 11:35:58 [scrapy.core.scraper] DEBUG: Scraped from <200 https://www.adidas.com/api/plp/content-engine/search?sitePath=us&query=tshirt>
{'price': 25, 'salePrice': 23}
2022-09-26 11:35:58 [scrapy.core.scraper] DEBUG: Scraped from <200 https://www.adidas.com/api/plp/content-engine/search?sitePath=us&query=tshirt>
{'price': 32, 'salePrice': 32}
2022-09-26 11:35:58 [scrapy.core.scraper] DEBUG: Scraped from <200 https://www.adidas.com/api/plp/content-engine/search?sitePath=us&query=tshirt>
{'price': 40, 'salePrice': 40}
2022-09-26 11:35:58 [scrapy.core.scraper] DEBUG: Scraped from <200 https://www.adidas.com/api/plp/content-engine/search?sitePath=us&query=tshirt>
{'price': 35, 'salePrice': 35}
2022-09-26 11:35:58 [scrapy.core.scraper] DEBUG: Scraped from <200 https://www.adidas.com/api/plp/content-engine/search?sitePath=us&query=tshirt>
{'price': 25, 'salePrice': 15}
2022-09-26 11:35:58 [scrapy.core.scraper] DEBUG: Scraped from <200 https://www.adidas.com/api/plp/content-engine/search?sitePath=us&query=tshirt>
{'price': 25, 'salePrice': 25}
2022-09-26 11:35:58 [scrapy.core.scraper] DEBUG: Scraped from <200 https://www.adidas.com/api/plp/content-engine/search?sitePath=us&query=tshirt>
{'price': 30, 'salePrice': 18}
2022-09-26 11:35:58 [scrapy.core.scraper] DEBUG: Scraped from <200 https://www.adidas.com/api/plp/content-engine/search?sitePath=us&query=tshirt>
{'price': 40, 'salePrice': 40}
2022-09-26 11:35:58 [scrapy.core.scraper] DEBUG: Scraped from <200 https://www.adidas.com/api/plp/content-engine/search?sitePath=us&query=tshirt>
{'price': 25, 'salePrice': 25}
2022-09-26 11:35:58 [scrapy.core.scraper] DEBUG: Scraped from <200 https://www.adidas.com/api/plp/content-engine/search?sitePath=us&query=tshirt>
{'price': 40, 'salePrice': 40}
2022-09-26 11:35:58 [scrapy.core.scraper] DEBUG: Scraped from <200 https://www.adidas.com/api/plp/content-engine/search?sitePath=us&query=tshirt>
{'price': 110, 'salePrice': 110}
2022-09-26 11:35:58 [scrapy.core.scraper] DEBUG: Scraped from <200 https://www.adidas.com/api/plp/content-engine/search?sitePath=us&query=tshirt>
{'price': 35, 'salePrice': 35}
2022-09-26 11:35:58 [scrapy.core.scraper] DEBUG: Scraped from <200 https://www.adidas.com/api/plp/content-engine/search?sitePath=us&query=tshirt>
{'price': 22, 'salePrice': 22}
2022-09-26 11:35:58 [scrapy.core.scraper] DEBUG: Scraped from <200 https://www.adidas.com/api/plp/content-engine/search?sitePath=us&query=tshirt>
{'price': 40, 'salePrice': 40}
2022-09-26 11:35:58 [scrapy.core.scraper] DEBUG: Scraped from <200 https://www.adidas.com/api/plp/content-engine/search?sitePath=us&query=tshirt>
{'price': 35, 'salePrice': 32}
2022-09-26 11:35:58 [scrapy.core.scraper] DEBUG: Scraped from <200 https://www.adidas.com/api/plp/content-engine/search?sitePath=us&query=tshirt>
{'price': 25, 'salePrice': 23}
2022-09-26 11:35:58 [scrapy.core.scraper] DEBUG: Scraped from <200 https://www.adidas.com/api/plp/content-engine/search?sitePath=us&query=tshirt>
{'price': 25, 'salePrice': 25}
2022-09-26 11:35:58 [scrapy.core.scraper] DEBUG: Scraped from <200 https://www.adidas.com/api/plp/content-engine/search?sitePath=us&query=tshirt>
{'price': 30, 'salePrice': 30}
... so on
Upvotes: 1