Melissa A
Melissa A

Reputation: 39

empty list response extract on scrapy

I'm new on scrapy and i have to crawl a webpage for a test. So I use the code below on a terminal but its returns a empty list i Don't understand why. When i use the same command on a another website, like amazon, with the right selector, it works. Can someone put light on it? thank you so much

scrapy shell "'https://www.woolworths.com.au/shop/browse/drinks/cordials-juices-iced-teas/iced-teas"

response.css('.tileList-title').extract()

Upvotes: 0

Views: 322

Answers (1)

AvyWam
AvyWam

Reputation: 970

First of all, when I consulted the source code of the page you seemed interested to scrape the title Iced Teas in a header tags <h1>. Am I right ?

Second, I tried scrapy shell sessions to understand the issue. It seems to be a settings of user-agent request's headers. Look at the code sessions below:

Without user-agent set

scrapy shell https://www.woolworths.com.au/shop/browse/drinks/cordials-juices-iced-teas/iced-teas
In [1]: response.css('.tileList-title').extract()                               
Out[1]: []
view(response) #open the given response in your local web browser, for inspection.

screenshot without user-agent

With user agent set

scrapy shell https://www.woolworths.com.au/shop/browse/drinks/cordials-juices-iced-teas/iced-teas -s USER_AGENT='Mozilla/5.0 (compatible; Googlebot/2.1; +http://www.google.com/bot.html)'

In [1]: response.css('.tileList-title').extract()                               
Out[1]: ['<h1 class="tileList-title" ng-if="$ctrl.listTitle" tabindex="-1">Iced Teas</h1>']
#now as you can see it does not return an empty list.
view(response)

screenshot with user-agent

So to improve your future practices, know you can use -s KEYWORDSETTING=value in your scrapy shell sessions. Here the settings key words for scrapy. And to check with view(response) to see if the requests returns the expected content even if it sent a 200. For my experience, with view(response) you can see that the content page, and even source code sometimes, is a little different when you use it in scrapy shell than when you use it in a normal browser. So that's a good practice to check with this shortcut. Here the shorcuts for scrapy. They are mentioned at each scrapy shell session too.

Upvotes: 1

Related Questions