Reputation: 583
I'm trying to scrape some site in Node.js. I've followed a great tutorial however realize that it might not be what I am looking for, ie. might be looking at scraping the javascript portion of the page instead of the html one.
Is that possible ?
Reason for that is that I am looking for loading the content of the below portion of the code I could find by inspecting in Safari (not showing in Chrome) a kayak.com page (see url below) and seems to be in a scripting section.
reducer: {"reducerPath":"flights\/results\/react\/reducers\/
Upvotes: 1
Views: 113
Reputation: 13822
UPDATE: Unfortunately, this site uses bot/scrape protection: tools like curl
get a page with bot warning, headless browser tools like puppeteer
get a page with captcha.
===============
As this line is present in the HTML source code and is not added dynamically by JavaScript execution, you can use something like this with the appropriate library API:
const extractedString = [...document.querySelectorAll('script')]
.map(({ textContent }) => textContent)
.find(txt => txt.includes('string'))
.match(/regexp/);
Upvotes: 1