shanglun
shanglun

Reputation: 73

Page loads in regular chrome but not in puppeteer

I am trying to load a page, http://www.nhc.gov.cn/wjw/index.shtml, on puppeteer as part of a covid-tracking program. The page loads very quickly in the regular chrome browser, but when I load it in puppeteer, the page load fails with a 412. What can I do to get the page to load and fully simulate a regular browser going to the page?

The code for reproduction of this phenomenon is below:

const puppeteer = require('puppeteer-core');

(async () => {
  const browser = await puppeteer.launch({ executablePath: '..\\executables\\chrome.exe', headless: false, args: ['--no-sandbox', '--disable-setuid-sandbox', '--disable-gpu'] });
  const page = await browser.newPage();
  Object.assign(global, { browser, page });

  page.on('console', msg => console.log(`chrome[${msg.text()}]`));
  await page.goto('http://www.nhc.gov.cn/wjw/index.shtml', { waitUntil: 'networkidle0' });
  await page.waitFor(15000);
  
  await page.screenshot({path: 'nhc_scrape.png'});

  await browser.close();
})();

Thank you in advance for your help!

Upvotes: 1

Views: 724

Answers (1)

Ericar974
Ericar974

Reputation: 46

you can use puppeteer-extra with the StealthPlugin.

https://www.npmjs.com/package/puppeteer-extra-plugin-stealth

Here is my code :

const puppeteer = require('puppeteer-extra')
const StealthPlugin = require("puppeteer-extra-plugin-stealth");

(async () => {
    puppeteer.use(StealthPlugin())
    const browser = await puppeteer.launch({headless: false, ignoreHTTPSErrors: true})
    const page = await browser.newPage();
    
    await page.goto('http://www.nhc.gov.cn/wjw/index.shtml');
    await page.waitForSelector('.inLists')
    
    await page.screenshot({path: 'nhc_scrape.png'});
    await browser.close();

})();

Upvotes: 3

Related Questions