Flavié
Flavié

Reputation: 37

Web scraping, i can't select the tags that i want

i was trying to do some web scraping and i found a problem, i have this JS script:

const request = require('request');
const cheerio = require('cheerio');
const url = 'https://www.sisal.it/scommesse-matchpoint?filtro=0&schede=man:1:21' // this is an 
italian betting site

request( url, (error, response, html) => {
   if (!error && response.statusCode == 200) {
      const $ = cheerio.load(html);

      let squadre = $("div"); 
      console.log(squadre.text())
   }
})

This returns me a very long string with all the web site's divs text but in this string there isn't the text i want. I made this script because after doing:

const $("div.*class*")

It returned me nothing even if the selectors were correct, do you have any ideas on why i can't select the divs i want?

Upvotes: 1

Views: 268

Answers (1)

Grynets
Grynets

Reputation: 2525

This page is dynamically created, means, if you make request with cheerio, you get boilerplate code for SPA, and data you need uploaded later.
To scrape this kind of sites you need something more advanced than cheerio.
Easy to use option - puppeteer
And the code would look something like this:

(async() => {
  const browser = await puppeteer.launch();
  const page = await browser.newPage();

  // Use here waitUntil to wait until additional requests would be made and the page would be fully loaded.
  await page.goto('https://www.sisal.it/scommesse-matchpoint?filtro=0&schede=man:1:21', {waitUntil: 'networkidle2'});

  const data = await page.evaluate(() => {
     // Make here all your JS actions and return JSON.stringify data.
     // You can access DOM with document.querySelector
     // and other JS methods for DOM manipulation
     return JSON.stringify({})
  });

  await browser.close()
})()

Just play around with puppeteer API and find out your way to handle this task.

Upvotes: 1

Related Questions