Difficulty in web-scraping data using scrapy

Question

I am trying to scrape data using scrapy from https://www.ta.com/portfolio/business-services, however the response is NULL. I am looking to scrape href in div.tiles js-portfolio-tiles using the code response.css("div.tiles.js-portfolio-tiles a::attr(href)").extract() I think this has something to do with ::before that appears just before this, but maybe not. How do I go about extracting this? website HTML

Calimocho · Accepted Answer

The elements that you are interested in retrieving are loaded by your browser using javascript. By default scrapy is not able to load elements using javascript as it is not a browser, it simply retrieves the raw HTML.

Scrapy shell is an invaluable tool for inspecting what is available in the response that scrapy receives.

This set of commands will open the response in your default web browser:

$ scrapy shell
>>> fetch("https://www.ta.com/portfolio/business-services")
>>> view (response)

As you can see the js-portfolio tiles are not visible as they have not been loaded.

I have had a look at the AJAX requests in the network panel of the developer tools and it appears that the information you require may be available in an XHR request. If it is not then you will need to use additional software to load the javascript, namely scrapy splash or selenium, I would advise exploring the AJAX (XHR) request first though as this will be much faster and easier.

See this question for additional details on using your browsers dev tools to inspect AJAX requests.

Difficulty in web-scraping data using scrapy

Answers (1)

Related Questions