super trainee
super trainee

Reputation: 117

Download images from webpage

i've tried to download images from a webpage, what am i missing here please ?

    import urllib
    from urllib.request import urlopen, Request 
    import requests
    from bs4 import BeautifulSoup
    import os

urlpage ='https://www.google.com/search?site=imghp&tbm=isch&source=hp&biw=1414&bih=709&q=little+cofee'
header = {'User-Agent': 'Mozilla/5.0'}  
page = urlopen(Request(urlpage,headers=header))
soup = BeautifulSoup(page)

images = soup.find_all("div", {"class":"thumb-pic"})
for image in images:
     imgUrl = image.a['href'].split("imgurl=")[1]
     urllib.request.urlretrieve(imgUrl, os.path.basename(imgUrl))

Upvotes: 0

Views: 3182

Answers (1)

Alex Ivanov
Alex Ivanov

Reputation: 833

It's tricky. Sometimes they use short URLs like "images/img.jpg", "/images/img.jpg", "../images/img.jpg". But the google page you are trying has no html tags at all. It contains just javascript.

I made a quick and dirty example just to show you how it might work in Python 2.7 but you can just save the page opened in your browser and all images will be saved neatly in a folder.

#!/usr/bin/python

import urllib

url ='http://www.blogto.com/cafes/little-nickys-coffee-toronto'
ext=['.jpg', '.png', '.gif'] # image type to download

response= urllib.urlopen(url)
html = response.read()

IMGs=[]
L=html.split('src="')
for item in L:
    item=item[:item.find('"')]
    item=item.strip()
    if item.find('http') == -1:
        item=url[:url.find('/', 10)]+item
    for e in ext:
        if item.find(e) != -1:
            if item not in IMGs:
                IMGs.append(item)


n=len(IMGs)
print 'Found', n, 'images'
i=1
for img in IMGs:
    ext=img[img.rfind('.'):]
    filename='0'*(len(str(n))-len(str(i)))+str(i)
    i += 1
    try:
        print img
        f = open(filename+ext,'wb')
        f.write(urllib.urlopen(img).read())
        f.close()
    except:
        print "Unpredictable error:", img

print 'Done!'

Upvotes: 1

Related Questions