Reputation: 294
So in my web scraper function, I have the below lines of code:
let portList = [9050, 9052, 9053, 9054, 9055, 9056, 9057, 9058, 9059, 9060];
let spoofPort = portList[Math.floor(Math.random()*portList.length)];
console.log("The chosen port was " + spoofPort);
const browser = await puppeteerExtra.launch({ headless: true, args: [
'--no-sandbox', '--disable-setuid-sandbox', '--proxy-server=socks5://127.0.0.1:' + spoofPort
]});
const page = await browser.newPage();
const userAgent = 'Mozilla/5.0 (X11; Linux x86_64)' +
'AppleWebKit/537.36 (KHTML, like Gecko) Chrome/64.0.3282.39 Safari/537.36';
await page.setUserAgent(userAgent);
I'm trying to rotate the IP address for each request (the function that contains this code is essentially called on each request from a client) so that I don't get blocked by the scraped website so fast. I get the below error:
2021-05-17T12:08:19.625349+00:00 app[web.1]: The chosen port was 9050
2021-05-17T12:08:20.042016+00:00 app[web.1]: Error: net::ERR_PROXY_CONNECTION_FAILED at https://expampleDomanPlaceholder.com
2021-05-17T12:08:20.042018+00:00 app[web.1]: at navigate (/app/node_modules/puppeteer/lib/cjs/puppeteer/common/FrameManager.js:115:23)
2021-05-17T12:08:20.042018+00:00 app[web.1]: at processTicksAndRejections (internal/process/task_queues.js:93:5)
2021-05-17T12:08:20.042019+00:00 app[web.1]: at async FrameManager.navigateFrame (/app/node_modules/puppeteer/lib/cjs/puppeteer/common/FrameManager.js:90:21)
2021-05-17T12:08:20.042020+00:00 app[web.1]: at async Frame.goto (/app/node_modules/puppeteer/lib/cjs/puppeteer/common/FrameManager.js:416:16)
2021-05-17T12:08:20.042021+00:00 app[web.1]: at async Page.goto (/app/node_modules/puppeteer/lib/cjs/puppeteer/common/Page.js:819:16)
2021-05-17T12:08:20.042021+00:00 app[web.1]: at async /app/app.js:174:9
I've tried the solutions detailed in these posts, but maybe the issue is with my userAgent?:
Getting error when attempting to use proxy server in Node.js / Puppeteer
https://github.com/puppeteer/puppeteer/issues/2472
UPDATE: I tried to use this buildpack (https://github.com/iamashks/heroku-buildpack-tor-proxy.git) but it kept causing my web dyno to break (an 'H14' Error was returned, which means you have to clear the build packs and re-add them). Not sure how to proceed from here as that really seemed to be the only solution I was able to come across.
Upvotes: 3
Views: 7161
Reputation: 1267
So there are a few issues.
Error: net::ERR_PROXY_CONNECTION_FAILED at https://expampleDomanPlaceholder.com
Here is an example of a proxy server in cambodia
We will use SOCKS4 proxy and IP location of this proxy at Cambodia.
Proxy IP address 96.9.77.192 and port 55796 (not sure if it still works)
const puppeteer = require('puppeteer');
(async () => {
let launchOptions = { headless: false,
args: ['--start-maximized',
'--proxy-server=socks4://96.9.77.192:55796'] // this is where we set the proxy
};
const browser = await puppeteer.launch(launchOptions);
const page = await browser.newPage();
// set viewport and user agent (just in case for nice viewing)
await page.setViewport({width: 1366, height: 768});
await page.setUserAgent('Mozilla/5.0 (X11; Linux x86_64) AppleWebKit/537.36 (KHTML, like Gecko) Chrome/78.0.3904.108 Safari/537.36');
// go to whatismycountry.com to see if proxy works (based on geography location)
await page.goto('https://whatismycountry.com');
// close the browser
await browser.close();
})();
#Proxy Issue
If the proxy host requires AUTH then the example below would be more fitting.
'use strict';
const puppeteer = require('puppeteer');
(async () => {
const username = process.env.USER
const password = process.env.PASS
const url = 'https://www.google.com'
const browser = await puppeteer.launch({
# proxy host must be correct.
args: [
'--proxy-server=socks5://proxyhost:8000',
],
});
const page = await browser.newPage();
await page.authenticate({
username,
password,
});
await page.goto(url);
await browser.close();
})();
this worked with tor.
Tor ('--proxy-server=socks5://localhost:9050')
References: thanks to @Grant Miller for the TOR testing.
https://dev.to/sonyarianto/practical-puppeteer-using-proxy-to-browse-a-page-1m82
How to make puppeteer work through socks5 proxy?
Upvotes: 2