Reputation: 2460
I am using Casperjs to Login in my Amazon Account and retrieve some data.
But once in a while I get Captchas on the login. So casperjs display to me the captcha and I manually return the solution so it can submit the form.
The problem is that CasperJS gets immediately another captcha, this time it's more difficult. I resolve this too, but another captcha appears... and so on indefinitely...
I don't do anything special, just some casperjs fill and click. Casperjs loads in the page an external js file with the captcha solution, and then submit.
I am sure that the right captcha is submited. How can Amazon be so sure to trap me in an infinite loop?
Upvotes: 1
Views: 994
Reputation: 2108
Unfortunately this is not an exact science, so probably there is no such thing as a general, durable solution. Amazon.com uses different techniques to check if you are a robot, including browser fingerprinting, cookie challenges and user behavior profiling (mouse movements and so on).
I would try first to randomize some part of the user agent, only to see if that works. And I would also try a full headless browser like Chromium, using Selenium to allow the script to talk with it.
Can I ask how frequently are you trying to crawl your account? I think it shouldn't be a big deal if you are doing that one a day or so.
Upvotes: 0
Reputation: 1089
Consider how it looks from their point of view. They can tell a robot is accessing your account based on mouse and keyboard interactions. A human will scan the page and move their mouse randomly while searching for the login buttons. Your script jumps directly to clicking the selector.
When a captcha appears, you fill it in. This does not prove you are a human. This simply proves that your robot can alert you to a captcha for a human to fill in. The rest of the interactions are all done by a robot, and Amazon is fully aware of this. You can answer as many captchas as you like, but the interactions to get this far are still going to be flagged as a robot.
You may want to go down a different route, like having a cookie to start a CasperJS session with your account already logged in. Alternatively, does Amazon provide any sort of API to pull out the value you're interested in?
They're blocking your robot out of geniune love and concern, if that makes you feel any better!
Upvotes: 0