Reputation: 9538
After watching a video about how to download images using python, I typed the code in the video and here's the code
import pandas as pd
import urllib.request
def url_to_jpg(i, url, file_path):
filename = 'image-{}.jpg'.format(i)
fullpath = '{}{}'.format(file_path, filename)
print(fullpath)
urllib.request.urlretrieve(url, fullpath)
print('{} saved.'.format(filename))
return None
FILENAME = 'Images URLs.csv'
FILE_PATH = 'Images/'
urls = pd.read_csv(FILENAME)
for i, url in enumerate(urls.values):
url_to_jpg(i, url, FILE_PATH)
When testing the code, I encountered error at this line
urllib.request.urlretrieve(url, fullpath)
which is like that
Images/image-0.jpg
---------------------------------------------------------------------------
TypeError Traceback (most recent call last)
<ipython-input-36-d92ed57d1d8e> in <module>
15
16 for i, url in enumerate(urls.values):
---> 17 url_to_jpg(i, url, FILE_PATH)
<ipython-input-36-d92ed57d1d8e> in url_to_jpg(i, url, file_path)
6 fullpath = '{}{}'.format(file_path, filename)
7 print(fullpath)
----> 8 urllib.request.urlretrieve(url, fullpath)
9 print('{} saved.'.format(filename))
10 return None
C:\ProgramData\Anaconda3\lib\urllib\request.py in urlretrieve(url, filename, reporthook, data)
243 data file as well as the resulting HTTPMessage object.
244 """
--> 245 url_type, path = _splittype(url)
246
247 with contextlib.closing(urlopen(url, data)) as fp:
C:\ProgramData\Anaconda3\lib\urllib\parse.py in _splittype(url)
1006 _typeprog = re.compile('([^/:]+):(.*)', re.DOTALL)
1007
-> 1008 match = _typeprog.match(url)
1009 if match:
1010 scheme, data = match.groups()
TypeError: cannot use a string pattern on a bytes-like object
Any ideas about that error?
** I have found the solution to a point which is modifying this line
url_to_jpg(i, url[0], FILE_PATH)
But it seems that some of the links are not allowed as I got another error
HTTPError: HTTP Error 403: Forbidden
How can I overcome this?
** I tried to add headers (agent) as suggested but don't know how to finish it properly. How to use urlretrieve
in that case?
import urllib.request
hdr = {'User-Agent': 'Mozilla/5.0 (X11; Linux x86_64) AppleWebKit/537.11 (KHTML, like Gecko) Chrome/23.0.1271.64 Safari/537.11',
'Accept': 'text/html,application/xhtml+xml,application/xml;q=0.9,*/*;q=0.8',
'Accept-Charset': 'ISO-8859-1,utf-8;q=0.7,*;q=0.3',
'Accept-Encoding': 'none',
'Accept-Language': 'en-US,en;q=0.8',
'Connection': 'keep-alive'}
response = urllib.request.Request("http://www.gunnerkrigg.com//comics/00000001.jpg", headers=hdr)
print(urllib.request.urlopen(response))
urllib.request.urlretrieve(urllib.request.urlopen(response).read(),'oo.jpg')
#urllib.request.urlretrieve("http://www.gunnerkrigg.com//comics/00000001.jpg", "00000001.jpg")
Upvotes: 0
Views: 1236
Reputation: 1068
This code will help you overcome for HTTPError: HTTP Error 403: Forbidden
It's header added version of your code.
import pandas as pd
import urllib.request
# build an opener
opener = urllib.request.build_opener()
# add a header for opener
opener.addheaders = [('User-agent', 'Mozilla/5.0 (Windows; U; Windows NT 5.1; en-US; rv:1.9.0.7) Gecko/2009021910 Firefox/3.0.7')]
# install opener once
urllib.request.install_opener(opener)
def url_to_jpg(i, url, file_path):
filename = 'image-{}.jpg'.format(i)
fullpath = '{}{}'.format(file_path, filename)
print(fullpath)
urllib.request.urlretrieve(url, fullpath)
print('{} saved.'.format(filename))
return None
FILENAME = 'Images URLs.csv'
FILE_PATH = 'Images/'
urls = pd.read_csv(FILENAME)
for i, url in enumerate(urls.values):
url_to_jpg(i, url[0], FILE_PATH)
Upvotes: 1