Kyle D. Hebert
Kyle D. Hebert

Reputation: 171

I don't understand why this XPath expression is not working as a Scrapy selector

I am just beginning to learn Scrapy, and I do not understand why the XPath described below is returning zero results.

I am trying to build a spider that crawls http://www.foodsafety.gov/recalls/recent/index.html

Specifically in my testing with the Scrapy shell I was trying to extract the headlines. Using the inspector in Safari's developer console I determined that the XPath for the headline text is //div[@id="recallList"]/h2/a/text(). Using find in the developer console I was able to locate 25 headlines with the above XPath.

However, when I use the Scrapy shell to test the XPath I get an empty list using

>> response.xpath('//div[@id="recallList"]/h2/a/text()').extract()

I am using

>> scrapy shell "http://www.foodsafety.gov/recalls/recent/index.html"

to crawl the site.

Upvotes: 0

Views: 408

Answers (1)

Rahul
Rahul

Reputation: 3386

The response gives empty result because the content is loaded through Javascript which is not supported by scrapy as of now. If you'll look in the network panel in the developer console, you will see a another request is made to this url http://ajax.googleapis.com/ajax/services/feed/load?v=1.0&callback=jsonp1455174771252&q=http://www.fda.gov/AboutFDA/ContactFDA/StayInformed/RSSFeeds/FoodSafety/rss.xml&num=13 which returns a json. You can use this url to get all your data.

Upvotes: 1

Related Questions