Reputation: 611
There is a web page which is partially generated with JS: https://www.ncbi.nlm.nih.gov/genome/genomes/971
I want to scrape the links in FTP
column. All of them are JS generated.
By default, scrapy
gets only HTML without executing JS. How can I change it?
Upvotes: 0
Views: 582
Reputation: 10220
If you are about to scrape a page that generates its content dynamically, the first thing to do is to look for an API being called. In your browser's development tools, look for XHR requests in the network tab. For the page you refer to, I can see request for
If you look in the response, you'll see that it contains the links that are under the FTP column on the page. You can simply use this API to get the information you need.
If you really want to render the page and scrape it, I suggest you use Splash. The best way to integrate it with Scrapy is using scrapy-splash library.
Upvotes: 1