Reputation: 14978
On Python3.2 I am getting following error when trying to get HTML from remote site, it works well on Python 2.7
Code:
def connectAmazon():
usleep = lambda x: sleep(x/1000000.0)
factor = 400
shouldRetry = True
retries = 0
headers = {'User-Agent': 'Mozilla/5.0 (Macintosh; Intel Mac OS X 10_6_8) AppleWebKit/537.36 (KHTML, like Gecko) Chrome/32.0.1700.102 Safari/537.36'}
attempt = 0
while shouldRetry == True:
random = randint(2, 9)
attempt += 1
print ("Attempt#", attempt)
#print (attempt)
url = "http://www.amazon.com/gp/offer-listing/B009OZUPUC/sr=/qid=/ref=olp_prime_new?ie=UTF8&colid=&coliid=&condition=new&me=&qid=&seller=&shipPromoFilter=1&sort=sip&sr"
html = requests.get(url)
status = html.status_code
if status == 200:
shouldRetry = False
print ("Success. Check HTML Below")
print(html.text) #The Buggy Line
break
elif status == 503:
retries += 1
delay = random * (pow(retries, 4)*100)
print ("Delay(ms) = ", delay)
#print (delay)
usleep(delay)
shouldRetry = True
connectAmazon()
What to be done to resolve this on Python 3.2 or Py 3.x?
Upvotes: 0
Views: 54
Reputation: 29794
Ok, Windows Command Line is very problematic with encodings*. The encoding error is because when outputting, print
is encoding html.text
into the cmd
encoding (you can know which one it is by issuing command chcp). There is probably one char in html.text
than can't be encoded in cmd
's encoding.
My solution for Python3 would be forcing an output encoding. Sadly, in Python3 this is a little more problematic than I would like. You'll need to replace the line print(html.text)
for:
import sys
sys.stdout.buffer.write(html.text.encode('utf8'))
Of course, that line won't work in Python2. In Python2 you can just encode
your output before printing it so print(html.text)
can be replaced with:
print html.text.encode('utf8')
Important note: In Python2 print
is a keyword, not a function. So calling print('hi')
works because print
is printing the expression inside the parenthesis. When you do print('hi',2)
you'll get the tuple ('hi',2)
outputted. That's not exactly what you want. It works by miracle :D
Hope this helps!
* This is due to its lack of support to utf8
. They have a weird 650001
code page which is not entirely the same as utf-8
and Python does not work with it.
Upvotes: 2