Reputation: 117
i've tried to download images from a webpage, what am i missing here please ?
import urllib
from urllib.request import urlopen, Request
import requests
from bs4 import BeautifulSoup
import os
urlpage ='https://www.google.com/search?site=imghp&tbm=isch&source=hp&biw=1414&bih=709&q=little+cofee'
header = {'User-Agent': 'Mozilla/5.0'}
page = urlopen(Request(urlpage,headers=header))
soup = BeautifulSoup(page)
images = soup.find_all("div", {"class":"thumb-pic"})
for image in images:
imgUrl = image.a['href'].split("imgurl=")[1]
urllib.request.urlretrieve(imgUrl, os.path.basename(imgUrl))
Upvotes: 0
Views: 3182
Reputation: 833
It's tricky. Sometimes they use short URLs like "images/img.jpg", "/images/img.jpg", "../images/img.jpg". But the google page you are trying has no html tags at all. It contains just javascript.
I made a quick and dirty example just to show you how it might work in Python 2.7 but you can just save the page opened in your browser and all images will be saved neatly in a folder.
#!/usr/bin/python
import urllib
url ='http://www.blogto.com/cafes/little-nickys-coffee-toronto'
ext=['.jpg', '.png', '.gif'] # image type to download
response= urllib.urlopen(url)
html = response.read()
IMGs=[]
L=html.split('src="')
for item in L:
item=item[:item.find('"')]
item=item.strip()
if item.find('http') == -1:
item=url[:url.find('/', 10)]+item
for e in ext:
if item.find(e) != -1:
if item not in IMGs:
IMGs.append(item)
n=len(IMGs)
print 'Found', n, 'images'
i=1
for img in IMGs:
ext=img[img.rfind('.'):]
filename='0'*(len(str(n))-len(str(i)))+str(i)
i += 1
try:
print img
f = open(filename+ext,'wb')
f.write(urllib.urlopen(img).read())
f.close()
except:
print "Unpredictable error:", img
print 'Done!'
Upvotes: 1