user1283776
user1283776

Reputation: 21764

How can I screen scrape a multi page application using javascript?

How can I screen scrape a multi page application? I want to do this using Javascript. Here are the approaches I have considered and the problems I have encountered.

Using the Fetch web API in a Node application to get the web pages

Problem: The web pages won't load properly when being fetched. I guess all javascript on the page does not run when the page is fetched.

Running JavaScript from the console

This is a very simple way to inject JavaScript straight into the document. But one problem is that opening the web page is a browser and pasting into the console is manual work. Another problem is that while this works for single page application it becomes very cumbersome for multi-page applications.

What better approach exists that solves the problems I have encountered?

Upvotes: 0

Views: 72

Answers (2)

s0ph1e
s0ph1e

Reputation: 98

If you want to save website content (html, js, css files, images) to file system you can take a look on website-scraper package for nodejs https://www.npmjs.com/package/website-scraper

It also has plugin for PhantomJS which allows to handle single page applications

Upvotes: 0

Ionel Lupu
Ionel Lupu

Reputation: 2801

Depends on what are you doing. If you just want to get some that from some website then injecting JS in the page is the way to go.

But as you said it's manual work from which I deduce you want to scrape the sites and save the data maybe. In this case a service side script is better suited. To fix the problem with the JavaScript not being loaded you can use things like PhantomJs or Horseman.

Take a look at this: https://medium.com/@designman/building-a-performant-web-scraper-in-node-js-5f4449674163

Upvotes: 1

Related Questions