Reputation: 1528
Basically, I am trying to download a URL using urllib2 in python.
the code is the following:
import urllib2
req = urllib2.Request('www.tattoo-cover.co.uk')
req.add_header('User-agent','Mozilla/5.0')
result = urllib2.urlopen(req)
it outputs ValueError and the program crushes for the URL in the example. When I access the url in a browser, it works fine.
Any ideas how to handle the problem?
UPDATE:
thanks for Ben James and sth the problem is detected => add 'http://'
Now the question is refined: Is it possible to handle such cases automatically with some builtin function or I have to do error handling with subsequent string concatenation?
Upvotes: 27
Views: 54269
Reputation: 875
You can use the method urlparse
from urllib
(Python 3) to check the presence of an addressing scheme (http, https, ftp) and concatenate the scheme in case it is not present:
In [1]: from urllib.parse import urlparse
..:
..: url = 'www.myurl.com'
..: if not urlparse(url).scheme:
..: url = 'http://' + url
..:
..: url
Out[1]: 'http://www.myurl.com'
Upvotes: 1
Reputation: 125119
When you enter a URL in a browser without the protocol, it defaults to HTTP. urllib2
won't make that assumption for you; you need to prefix it with http://
.
Upvotes: 46
Reputation: 229563
You have to use a complete URL including the protocol, not just specify a host name.
The correct URL would be http://www.tattoo-cover.co.uk/
.
Upvotes: 7