Dionysian
Dionysian

Reputation: 1235

Parsing JavaScript inside HTML with node.js

I am learning to use request and cheerio to parse a simple html file. However, in the page there is many script tag and inside them reside the actual data. For example like

<script> var data = {"name":"John","age":33} </script>

So naturally the thing that is interesting is the "data" variable. Is there a more natural way then doing regex to get that data?

Upvotes: 1

Views: 1574

Answers (2)

jackchen
jackchen

Reputation: 71

With the new version jsdom(v16.4.0, nodejs 12.6.0), jsdom.jsdom doesnt exist, we can use new JSDOM like below:

const jsdom = require("jsdom");
const { JSDOM } = jsdom;
const dom = new JSDOM(`<script> var foo = "bar" </script>`, { runScripts: "dangerously" });
console.log(dom.window.foo);  // output is:  bar

Upvotes: 2

Noah
Noah

Reputation: 34313

I don't believe cheerio supports parsing inline scripts. However you can use jsdom for your use case

var jsdom = require('jsdom')
var html = '<script>var data = {"name":"John","age":33} </script>'

jsdom.defaultDocumentFeatures = {
  FetchExternalResources: ['script'],
  ProcessExternalResources: ['script'],
  MutationEvents: '2.0',
  QuerySelector: false
}

var document = jsdom.jsdom(html)
var window = document.createWindow()
console.dir(window.data)

Upvotes: 0

Related Questions