Reputation: 73
I am trying to load a page, http://www.nhc.gov.cn/wjw/index.shtml, on puppeteer as part of a covid-tracking program. The page loads very quickly in the regular chrome browser, but when I load it in puppeteer, the page load fails with a 412. What can I do to get the page to load and fully simulate a regular browser going to the page?
The code for reproduction of this phenomenon is below:
const puppeteer = require('puppeteer-core');
(async () => {
const browser = await puppeteer.launch({ executablePath: '..\\executables\\chrome.exe', headless: false, args: ['--no-sandbox', '--disable-setuid-sandbox', '--disable-gpu'] });
const page = await browser.newPage();
Object.assign(global, { browser, page });
page.on('console', msg => console.log(`chrome[${msg.text()}]`));
await page.goto('http://www.nhc.gov.cn/wjw/index.shtml', { waitUntil: 'networkidle0' });
await page.waitFor(15000);
await page.screenshot({path: 'nhc_scrape.png'});
await browser.close();
})();
Thank you in advance for your help!
Upvotes: 1
Views: 724
Reputation: 46
you can use puppeteer-extra with the StealthPlugin.
https://www.npmjs.com/package/puppeteer-extra-plugin-stealth
Here is my code :
const puppeteer = require('puppeteer-extra')
const StealthPlugin = require("puppeteer-extra-plugin-stealth");
(async () => {
puppeteer.use(StealthPlugin())
const browser = await puppeteer.launch({headless: false, ignoreHTTPSErrors: true})
const page = await browser.newPage();
await page.goto('http://www.nhc.gov.cn/wjw/index.shtml');
await page.waitForSelector('.inLists')
await page.screenshot({path: 'nhc_scrape.png'});
await browser.close();
})();
Upvotes: 3