Reputation: 93
For 2 days my scripts have stopped working. If I perform the works manually from any browser (Chrome, Mozilla etc ...) No problem. I think the problem should be in the headers of phantojs. How could you simulate headers in phantomjs as if it were a normal browser? These lines below are what the website shows me when I access with pantomjs / casperjs
Pardon Our Interruption...
As you were browsing something about your browser made us think you were a bot. There are a few reasons this might happen:
You're a power user moving through this website with super-human speed. You've disabled JavaScript in your web browser. A third-party browser plugin, such as Ghostery or NoScript, is preventing JavaScript from running. Additional information is available in this support article.
After completing the CAPTCHA below, you will immediately regain access to
In my scripts I have this configuration:
var casper = require("casper").create ({
engine: 'phantomjs',
exitOnError: false,
ignoreSslErrors: true,
waitTimeout: 5000,
stepTimeout: 5000,
verbose: true,
pageSettings: {
webSecurityEnabled: false,
javascriptEnabled: true,
loadImages: true,
loadPlugins: true,
localToRemoteUrlAccessEnabled: true,
userAgent: 'Mozilla/5.0 (X11; CrOS x86_64 8172.45.0) AppleWebKit/537.36 (KHTML, like Gecko) Chrome/51.0.2704.64 Safari/537.36',
XSSAuditingEnabled: false,
logLevel: 'debug'
},
onWaitTimeout: function() {
// this.echo('** Wait-TimeOut **');
},
onStepTimeout: function() {
//this.echo('** Step-TimeOut **');
}
});
Upvotes: 1
Views: 1158
Reputation: 593
Things that can help in general :
Upvotes: 1
Reputation: 16838
First things first: if a third-party site goes to that much effort of detecting bots they probably do not want you to use bots, so you should probably comply.
As for ways of detecting PhantomJS: there are plenty, from the wrong order of request headers, absense of media plugins to specific methods and even disclosure of phantomjs in error stack trace.
Here's an excellent presentation on the matter: Detecting headless browsers.
I know just giving links to remote pages is frowned upon but there are too many too different points to mention here and they all should be adressed in counter detection efforts.
Bonus suggestion: have a look at puppeteer if you're not too invested in PhantomJS infrastructure.
Upvotes: 1