How to scrape the javascript portion of a webpage?

Question

I'm trying to scrape some site in Node.js. I've followed a great tutorial however realize that it might not be what I am looking for, ie. might be looking at scraping the javascript portion of the page instead of the html one.

Is that possible ?

Reason for that is that I am looking for loading the content of the below portion of the code I could find by inspecting in Safari (not showing in Chrome) a kayak.com page (see url below) and seems to be in a scripting section.

reducer: {"reducerPath":"flights\/results\/react\/reducers\/

https://www.kayak.com/flights/TYO-PAR/2019-07-05-flexible/2019-07-14-flexible/1adults/children-11?fs=cfc=1;legdur=-960;stops=~0;bfc=1&sort=bestflight_a&attempt=2&lastms=1550392662619

vsemozhebuty · Accepted Answer

UPDATE: Unfortunately, this site uses bot/scrape protection: tools like curl get a page with bot warning, headless browser tools like puppeteer get a page with captcha.

===============

As this line is present in the HTML source code and is not added dynamically by JavaScript execution, you can use something like this with the appropriate library API:

const extractedString = [...document.querySelectorAll('script')]
  .map(({ textContent }) => textContent)
  .find(txt => txt.includes('string'))
  .match(/regexp/);

How to scrape the javascript portion of a webpage?

Answers (1)

Related Questions