Reputation: 570
I want to scrapy product data via scrapy. Here is the product link :https://www.ingco.com/products/103803
To check the response I use this code
In [2]: response.css('div.d-flex::text').get()
In [3]: response.css('div.d-flex::text').extract()
Out[3]: []
In [4]: response.css('div.d-flex::text').extract
Out[4]: <bound method SelectorList.getall of []>
In [5]: response.css('div.d-flex::text').extract()
Out[5]: []
In [6]: response.css('div.d-flex::text').extract();
In [7]: response.css('div.d-flex').extract();
But it provides nothing. Please check what I did wrong
Upvotes: 0
Views: 141
Reputation: 2619
use this url https://webcenterapi.ingco.com/website-product/product-info-detail?id=103803
to extract data. Data loaded via json api.
In [3]: url ="https://webcenterapi.ingco.com/website-product/product-info-detail?id=103803"
In [4]: r = scrapy.Request(url)
In [5]: fetch(r)
2021-01-11 13:42:14 [scrapy.core.engine] INFO: Spider opened
2021-01-11 13:42:15 [scrapy.core.engine] DEBUG: Crawled (200) <GET https://webcenterapi.ingco.com/website-product/product-info-detail?id=103803> (referer: None)
In [6]: import json
In [7]: jsonresponse = json.loads(response.text)
In [8]: jsonresponse['data']
Out[8]:
{'id': 103803,
'productNo': 'HPWR14008',
'productName': 'High pressure washer',
'keyData1': '220-240V~50/60Hz',
'keyData2': 'Pure copper wire brush motor',
'keyData3': 'Input power:1400W',
'parameter': 'Voltage: 220-240V~50/60Hz<br>Carbon brush motor<br>Pure copper wire<br>Input power:1400W<br>Max Pressure:130Bar (1900PSI)<br>Flow rate:5.5L/min<br>Auto stop system<br>1 set water spray gun (AMSG028 )<br>5m high pressure hose( AHPH5028)<br>Packed by color box',
'isIndustry': 1,
'categoryId': 11,
'categoryName': 'Garden tools',
'video': [{'video': 'https://www.ingco.com/userfiles/32959185488b4b11936e318b589f1edc/flash/video/20181210/HPWR14008.mp4',
'videoType': 1}],
'picture': ['https://www.ingco.com/userfiles/1/images/photo/20200730/HPWR14008.jpg'],
'relevant': [],
'annex': []}
In [9]: jsonresponse['data']['productNo']
Out[9]: 'HPWR14008'
Upvotes: 1
Reputation: 28266
If you take a look at the actual html source of the page (Ctrl+U
in most browsers), you will see that it doesn't contain the information you want to scrape.
Product details are loaded by javascript from an api url (https://webcenterapi.ingco.com/website-product/product-info-detail?id=103803).
The data is in json format, and the api seems publicly available, so your job should be pretty simple.
Upvotes: 2