Reputation: 37
i was trying to do some web scraping and i found a problem, i have this JS script:
const request = require('request');
const cheerio = require('cheerio');
const url = 'https://www.sisal.it/scommesse-matchpoint?filtro=0&schede=man:1:21' // this is an
italian betting site
request( url, (error, response, html) => {
if (!error && response.statusCode == 200) {
const $ = cheerio.load(html);
let squadre = $("div");
console.log(squadre.text())
}
})
This returns me a very long string with all the web site's divs text but in this string there isn't the text i want. I made this script because after doing:
const $("div.*class*")
It returned me nothing even if the selectors were correct, do you have any ideas on why i can't select the divs i want?
Upvotes: 1
Views: 268
Reputation: 2525
This page is dynamically created, means, if you make request with cheerio, you get boilerplate code for SPA, and data you need uploaded later.
To scrape this kind of sites you need something more advanced than cheerio.
Easy to use option - puppeteer
And the code would look something like this:
(async() => {
const browser = await puppeteer.launch();
const page = await browser.newPage();
// Use here waitUntil to wait until additional requests would be made and the page would be fully loaded.
await page.goto('https://www.sisal.it/scommesse-matchpoint?filtro=0&schede=man:1:21', {waitUntil: 'networkidle2'});
const data = await page.evaluate(() => {
// Make here all your JS actions and return JSON.stringify data.
// You can access DOM with document.querySelector
// and other JS methods for DOM manipulation
return JSON.stringify({})
});
await browser.close()
})()
Just play around with puppeteer API and find out your way to handle this task.
Upvotes: 1