Hannah James
Hannah James

Reputation: 570

Scrapy shell responce.css send empty output

I want to scrapy product data via scrapy. Here is the product link :https://www.ingco.com/products/103803

To check the response I use this code

In [2]: response.css('div.d-flex::text').get()

In [3]: response.css('div.d-flex::text').extract()
Out[3]: []

In [4]: response.css('div.d-flex::text').extract
Out[4]: <bound method SelectorList.getall of []>

In [5]: response.css('div.d-flex::text').extract()
Out[5]: []

In [6]: response.css('div.d-flex::text').extract();

In [7]: response.css('div.d-flex').extract();

Screenshot

But it provides nothing. Please check what I did wrong

Upvotes: 0

Views: 141

Answers (2)

Samsul Islam
Samsul Islam

Reputation: 2619

use this url https://webcenterapi.ingco.com/website-product/product-info-detail?id=103803 to extract data. Data loaded via json api.

In [3]: url ="https://webcenterapi.ingco.com/website-product/product-info-detail?id=103803"

In [4]: r = scrapy.Request(url)

In [5]: fetch(r)
2021-01-11 13:42:14 [scrapy.core.engine] INFO: Spider opened
2021-01-11 13:42:15 [scrapy.core.engine] DEBUG: Crawled (200) <GET https://webcenterapi.ingco.com/website-product/product-info-detail?id=103803> (referer: None)

In [6]: import json

In [7]: jsonresponse = json.loads(response.text)

In [8]: jsonresponse['data']
Out[8]: 
{'id': 103803,
 'productNo': 'HPWR14008',
 'productName': 'High pressure washer',
 'keyData1': '220-240V~50/60Hz',
 'keyData2': 'Pure copper wire brush motor',
 'keyData3': 'Input power:1400W',
 'parameter': 'Voltage: 220-240V~50/60Hz<br>Carbon brush motor<br>Pure copper wire<br>Input power:1400W<br>Max Pressure:130Bar (1900PSI)<br>Flow rate:5.5L/min<br>Auto stop system<br>1 set water spray gun (AMSG028 )<br>5m high pressure hose( AHPH5028)<br>Packed by color box',
 'isIndustry': 1,
 'categoryId': 11,
 'categoryName': 'Garden tools',
 'video': [{'video': 'https://www.ingco.com/userfiles/32959185488b4b11936e318b589f1edc/flash/video/20181210/HPWR14008.mp4',
   'videoType': 1}],
 'picture': ['https://www.ingco.com/userfiles/1/images/photo/20200730/HPWR14008.jpg'],
 'relevant': [],
 'annex': []}

In [9]: jsonresponse['data']['productNo']
Out[9]: 'HPWR14008'

Upvotes: 1

stranac
stranac

Reputation: 28266

If you take a look at the actual html source of the page (Ctrl+U in most browsers), you will see that it doesn't contain the information you want to scrape.
Product details are loaded by javascript from an api url (https://webcenterapi.ingco.com/website-product/product-info-detail?id=103803).

The data is in json format, and the api seems publicly available, so your job should be pretty simple.

Upvotes: 2

Related Questions