iMSn20
iMSn20

Reputation: 255

Getting <span> text on web scraping

I'm using Puppeteer and jsDOM to scrape this site: https://www.lcfc.com/matches/results.

I want the names of the teams of every match, so on the console I use this:

document.querySelectorAll('.match-item__team-container span')
  .forEach(element => console.log(element.textContent));

On the console, the names prints ok but when I use this on my code it returns nothing.

This is my code:

const puppeteer = require('puppeteer');
const jsdom = require('jsdom');
(async () => {
  try {
    const browser = await puppeteer.launch() ;
    const page = await browser.newPage();
    const response = await page.goto('https://www.lcfc.com/matches/results');
    const body = await response.text();
    const { window: { document } } = new jsdom.JSDOM(body);

    document.querySelectorAll('.match-item__team-container span')
      .forEach(element => console.log(element.textContent));

    await browser.close();
  } catch (error) {
    console.error(error);
  }
})();

And I don't have any error. Some suggestion? Thank you.

I tried with this code now, but still not working. I show the code and a picture of the console:

const puppeteer = require('puppeteer');
(async () => {
  try {
    const browser = await puppeteer.launch() ;
    const page = await browser.newPage();
    await page.waitForSelector('.match-item__team-container span');
    const data = await page.evaluate(() => {
      document.querySelectorAll('.match-item__team-container span')
          .forEach(element => console.log(element.textContent));
    });
    //listen to console events in the chrome tab and log it in nodejs process
    page.on('console', consoleObj => console.log(consoleObj.text()));

    await browser.close();
  } catch (error) {
    console.log(error);
  }
})();

enter image description here

Upvotes: 1

Views: 526

Answers (1)

Metabolic
Metabolic

Reputation: 2904

Do it puppeter way and use evaluate to run your code after waiting for the selector to appear via waitForSelector

await page.waitForSelector('.match-item__team-container span');
const data = await page.evaluate(() => {
  document.querySelectorAll('.match-item__team-container span')
      .forEach(element => console.log(element.textContent));
    //or return the values of the selected item
   return somevalue; 
});
//listen to console events in the chrome tab and log it in nodejs process
page.on('console', consoleObj => console.log(consoleObj.text()));

evaluate runs your code inside the active tab of the chrome so you will not need jsDOM to parse the response.

UPDATE The new timeout issue is because the page is taking too long to load: use {timeout : 0}

const data = await page.evaluate(() => {
  document.querySelectorAll('.match-item__team-container span')
      .forEach(element => console.log(element.textContent));
    //or return the values of the selected item
   return somevalue; 
},{timeout:60000});

Upvotes: 1

Related Questions