Javier Sega
Javier Sega

Reputation: 93

Phantomjs / Casperjs now does not work website detects bot

For 2 days my scripts have stopped working. If I perform the works manually from any browser (Chrome, Mozilla etc ...) No problem. I think the problem should be in the headers of phantojs. How could you simulate headers in phantomjs as if it were a normal browser? These lines below are what the website shows me when I access with pantomjs / casperjs

Pardon Our Interruption...

As you were browsing something about your browser made us think you were a bot. There are a few reasons this might happen:

You're a power user moving through this website with super-human speed. You've disabled JavaScript in your web browser. A third-party browser plugin, such as Ghostery or NoScript, is preventing JavaScript from running. Additional information is available in this support article.

After completing the CAPTCHA below, you will immediately regain access to

In my scripts I have this configuration:

var casper = require("casper").create ({
	engine: 'phantomjs', 
    exitOnError: false,
    ignoreSslErrors: true,
    waitTimeout: 5000,
    stepTimeout: 5000,
    verbose: true,
 
  pageSettings: {
        webSecurityEnabled: false,
        javascriptEnabled: true,
        loadImages: true,
        loadPlugins: true,
        localToRemoteUrlAccessEnabled: true,
        userAgent: 'Mozilla/5.0 (X11; CrOS x86_64 8172.45.0) AppleWebKit/537.36 (KHTML, like Gecko) Chrome/51.0.2704.64 Safari/537.36',
        XSSAuditingEnabled: false,
        logLevel: 'debug'
  },
  onWaitTimeout: function() {
       // this.echo('** Wait-TimeOut **');
  },
  onStepTimeout: function() {
        //this.echo('** Step-TimeOut **');
  }
});

Upvotes: 1

Views: 1158

Answers (2)

Grubshka
Grubshka

Reputation: 593

Things that can help in general :

  • Headers should be similar to common browsers, including :
  • Navigation :
    • If you make multiple request, put a random timeout between them
    • If you open links found in a page, set the Referer header accordingly
    • Or better, simulate mouse activity to move, click and follow link
  • Images should be enabled
  • Javascript should be enabled
    • Check that "navigator.plugins" and "navigator.language" are set in the client javascript page context
    • Check that the client you use does not inject noticeable javascript variables (like _cdc, __nightmare...)
  • Use proxies

Upvotes: 1

Vaviloff
Vaviloff

Reputation: 16838

First things first: if a third-party site goes to that much effort of detecting bots they probably do not want you to use bots, so you should probably comply.

As for ways of detecting PhantomJS: there are plenty, from the wrong order of request headers, absense of media plugins to specific methods and even disclosure of phantomjs in error stack trace.

Here's an excellent presentation on the matter: Detecting headless browsers.

I know just giving links to remote pages is frowned upon but there are too many too different points to mention here and they all should be adressed in counter detection efforts.

Bonus suggestion: have a look at puppeteer if you're not too invested in PhantomJS infrastructure.

Upvotes: 1

Related Questions