Rap
Rap

Reputation: 51

selenium phantomjs can't scrape a website bot detection

I can't scrape this site here is a screenshot of the request on python selenium phantomjs. I dont know how they detected it was a bot but says on the picture need javascript and need captcha and maybe what other things needed to? Definitely Im not scraping at superhuman speed because it is my first request so it was not the cause. P.S. when I paste the same request on my browser it directs to the page that i want and works okay.

    br = webdriver.PhantomJS('bin/phantomjs')
    br.set_window_size(1366, 200)
    br.get("website")
    br.save_screenshot(x)

Upvotes: 1

Views: 1941

Answers (2)

Grubshka
Grubshka

Reputation: 593

Things that can help in general :

  • Headers should be similar to common browsers, including :
  • Navigation :
    • If you make multiple request, put a random timeout between them
    • If you open links found in a page, set the Referer header accordingly
    • Or better, simulate mouse activity to move, click and follow link
  • Images should be enabled
  • Javascript should be enabled
    • Check that "navigator.plugins" and "navigator.language" are set in the client javascript page context
    • Check that the client you use does not inject noticeable javascript variables (like _cdc, __nightmare...)
  • Use proxies

Upvotes: 0

Rap
Rap

Reputation: 51

Well I got it working now. I'll simply put this for the sake of other people who doesn't. enable javascript and fake useragent

    cap = webdriver.DesiredCapabilities.PHANTOMJS
    cap["phantomjs.page.settings.javascriptEnabled"] = True
    cap["phantomjs.page.settings.loadImages"] = True
    cap["phantomjs.page.settings.userAgent"] = 'Mozilla/5.0 (Windows NT 10.0; WOW64; rv:50.0) Gecko/20100101 Firefox/50.0'
    br = webdriver.PhantomJS('bin/phantomjs',desired_capabilities=cap)

Upvotes: 3

Related Questions