urllib2 request returns a different page about 1 in 5 times

Question

import urllib2

req = urllib2.Request('http://www.amazon.com/Sweet-Virgin-Organic-Coconut-13-5oz/dp/B00Q5CIL4Y', headers={ 'User-Agent': 'Mozilla/5.0' })

html = urllib2.urlopen(req).read()
print len(html)

That's the smallest example I can make. If you run that then ~1 in 5 times the length of the response will be 5769, and the other times it will be a normal usable response.

Whats up with this?

edit:

incorrect response: http://pastebin.com/d7zdy0uv

Steve Jessop · Accepted Answer

Given the content of the short responses, this becomes much easier to answer. Amazon suspects you're doing automated scraping of its site, and has served you a CAPTCHA that, if you were a human using a browser, you could solve.

I'm slightly surprised it only hits you one in five requests, though, rather than either always or never.

As it says in Amazon's response, consider using their APIs instead.

urllib2 request returns a different page about 1 in 5 times

Answers (2)

Related Questions