Kabal
Kabal

Reputation: 53

Python urllib freezes with specific URL

I am trying to fetch a page and urlopen hangs and never returns anything, although the web page is very light and can be opened with any browser without any problems

import urllib.request
with urllib.request.urlopen("http://www.planalto.gov.br/ccivil_03/_Ato2007-2010/2008/Lei/L11882.htm") as response:
    print(response.read())

This simple code just freezes while retrieving the response, but if you try to open http://www.planalto.gov.br/ccivil_03/_Ato2007-2010/2008/Lei/L11882.htm it opens without any problem

Upvotes: 3

Views: 1255

Answers (1)

olamork
olamork

Reputation: 179

www.planalto.gov.br is using user-agent detection. If you specify a valid user-agent, the request fulfills correctly. The urllib library didn't crash, it's just waiting.

curl -H "User-Agent:Mozilla/5.0 (Macintosh; Intel Mac OS X 10_11_6) AppleWebKit/537.36 (KHTML, like Gecko) Chrome/57.0.2987.133 Safari/537.36" http://www.planalto.gov.br/ccivil_03/_Ato2007-2010/2008/Lei/L11882.htm

worked just fine for me but

curl http://www.planalto.gov.br/ccivil_03/_Ato2007-2010/2008/Lei/L11882.htm

did not.

Like RPGillespie said above, use urllib2 or requests to add the user-agent header (see How do I set headers using python's urllib? for more information about that).

Upvotes: 1

Related Questions