Reputation: 625
when i try to scrape a reactjs website using nodejs i am getting the content of index.html file only not the tags that were used in the website. Here is what i have tried -
const request = require("request");
const cheerio = require("cheerio");
const URL = "https://pydata-jal.netlify.com/";
request(URL, (err, res, body) => {
if (!err && res.statusCode == 200) {
const $ = cheerio.load(body);
console.log($.html());
}
});
What should i do to get the whole of tags that were used in react website.
And do tell i can scrape the hackernoon website ? (for just example) if its legal?
Upvotes: 1
Views: 723
Reputation: 15851
Cheerio parses only already rendered HTML (eg: static HTML) In order to get the React render you should rely on headless browsers controlled with tools like Puppeteer
Upvotes: 1