Ender Bonnet
Ender Bonnet

Reputation: 163

Can't Scrape Input Value from dolartoday.com with Puppeteer

I want to scrape the value of element #result with:

 const puppeteer = require('puppeteer');

    (async () => {
      const browser = await puppeteer.launch();
      const page = await browser.newPage();
      await page.goto('https://dolartoday.com');
      await console.log(page.evaluate(() => document.getElementById('result')));

      await browser.close();
    })();

But it still logs the following error:

(node:74908) UnhandledPromiseRejectionWarning: Error: Navigation Timeout Exceeded: 30000ms exceeded
at Promise.then (/Volumes/DATOS/Dropbox/workspaces/dolar-today/server/node_modules/puppeteer/lib/NavigatorWatcher.js:71:21)
at <anonymous>
(node:74908) UnhandledPromiseRejectionWarning: Unhandled promise rejection. This error originated either by throwing inside of an async function without a catch block, or by rejecting a promise which was not handled with .catch(). (rejection id: 1)
(node:74908) [DEP0018] DeprecationWarning: Unhandled promise rejections are deprecated. In the future, promise rejections that are not handled will terminate the Node.js process with a non-zero exit code.

Any idea on how to solve this problem?

Upvotes: 0

Views: 1486

Answers (1)

Grant Miller
Grant Miller

Reputation: 29047

First and foremost, you are attempting to use the await operator on console.log() (a synchronous function), rather than on page.evaluate() (an asynchronous function).

You are also attempting to return a Page DOM element to the Node.js environment, which will not work because page.evaluate() is expecting a serializable return value.

If you would like to return the value of the #result element on the web page, you should rewrite your logic as follows:

console.log(await page.evaluate(() => document.getElementById('result').value));

Furthermore, the navigation time has exceeded 30000 milliseconds (the default maximum). You can expand the maximum navigation time using the timeout option in your page.goto() function:

await page.goto('https://dolartoday.com', {
  timeout: 60000,
});

You can also reject unnecessary resources from loading in the web page using page.setRequestInterception() and page.on('request'). This will make your web page load much faster:

await page.setRequestInterception(true);

page.on('request', request => {
  if (['image', 'stylesheet', 'font'].indexOf(request.resourceType()) !== -1) {
    request.abort();
  } else {
    request.continue();
  }
});

Your final program should look something like this:

'use strict';

const puppeteer = require('puppeteer');

(async () => {
  const browser = await puppeteer.launch();
  const page = await browser.newPage();

  await page.setRequestInterception(true);

  page.on('request', request => {
    if (['image', 'stylesheet', 'font'].indexOf(request.resourceType()) !== -1) {
      request.abort();
    } else {
      request.continue();
    }
  });

  await page.goto('https://dolartoday.com', {
    timeout: 60000,
  });

  console.log(await page.evaluate(() => document.getElementById('result').value));

  await browser.close();
})();

Upvotes: 2

Related Questions