Maverick
Maverick

Reputation: 74

Scrape dynamic site using puppeteer

I am trying to builder a simple scraper that would scrape the Trailblazer Profile site. I want to get the number of badges and points of the user.

So I am using cheerio and puppeteer to accomplish this.

here is my code -->

 .get("/:profile", (req,res,next) => {

  const url = "https://trailblazer.me/id/hverma99";

  async function getPage(url) {
    const browser = await puppeteer.launch({headless: true});
    const page = await browser.newPage();
    await page.goto(url, {waitUntil: 'networkidle0'});

    const html = await page.content(); // serialized HTML of page DOM.
    await browser.close();
    return html;
  }

  const html = getPage(url);
  const $ = cheerio.load(html);
  const span = $('.tds-tally__count.tds-tally__count_success');
  console.log(span.text());

});

the profile parameter is not being used as of right now as I am just testing this.

Problem : Whenever I run this code I do not get anything printed on the console and if i try without puppeteer then I only get the html without any data. My expected result is the number of badges and points.

Let me know what is wrong with this code.

Thanks

Upvotes: 2

Views: 1891

Answers (1)

Ashish Modi
Ashish Modi

Reputation: 7770

Everything is correct. All you have to do is await your getPage call as it is async. try this

.get("/:profile", async (req,res,next) => {

  const url = "https://trailblazer.me/id/hverma99";

  async function getPage(url) {
    const browser = await puppeteer.launch({headless: true});
    const page = await browser.newPage();
    await page.goto(url, {waitUntil: 'networkidle0'});

    const html = await page.content(); // serialized HTML of page DOM.
    await browser.close();
    return html;
  }

  const html = await getPage(url);
  const $ = cheerio.load(html);
  const span = $('.tds-tally__count.tds-tally__count_success');
  console.log(span.text());

});

Also need to put async like this - async (req,res,next)

Upvotes: 3

Related Questions