Reputation: 21764
How can I screen scrape a multi page application? I want to do this using Javascript. Here are the approaches I have considered and the problems I have encountered.
Using the Fetch web API in a Node application to get the web pages
Problem: The web pages won't load properly when being fetched. I guess all javascript on the page does not run when the page is fetched.
Running JavaScript from the console
This is a very simple way to inject JavaScript straight into the document. But one problem is that opening the web page is a browser and pasting into the console is manual work. Another problem is that while this works for single page application it becomes very cumbersome for multi-page applications.
What better approach exists that solves the problems I have encountered?
Upvotes: 0
Views: 72
Reputation: 98
If you want to save website content (html, js, css files, images) to file system you can take a look on website-scraper
package for nodejs https://www.npmjs.com/package/website-scraper
It also has plugin for PhantomJS which allows to handle single page applications
Upvotes: 0
Reputation: 2801
Depends on what are you doing. If you just want to get some that from some website then injecting JS in the page is the way to go.
But as you said it's manual work from which I deduce you want to scrape the sites and save the data maybe. In this case a service side script is better suited. To fix the problem with the JavaScript not being loaded you can use things like PhantomJs or Horseman.
Take a look at this: https://medium.com/@designman/building-a-performant-web-scraper-in-node-js-5f4449674163
Upvotes: 1