Dhvani Shah
Dhvani Shah

Reputation: 371

Scrape Google images based on search term

I wish to scrape all the images shown in the following URL: happiness

I tried many ways but I am able to fetch only 20 images. Below is the code in Python for the same:

query = input("happiness")# you can change the query for the image  here
image_type="ActiOn"
query= query.split()
query='+'.join(query)
url="https://www.google.co.in/search?q="+query+"&source=lnms&tbm=isch"
print(url)
#add the directory for your image here
DIR="Pictures"
header={'User-Agent':"Mozilla/5.0 (Windows NT 6.1; WOW64) AppleWebKit/537.36 
(KHTML, like Gecko) Chrome/43.0.2357.134 Safari/537.36"
}
soup = get_soup(url,header)
if not os.path.exists(DIR):
        os.mkdir(DIR)
DIR = os.path.join(DIR, query.split()[0])

if not os.path.exists(DIR):
        os.mkdir(DIR)

images = [a['src'] for a in soup.find_all("img", {"src": 
re.compile("gstatic.com")})]
print(images)
print("there are total" , len(images),"images")
image_type = "Action"
#print images
for img in images:
raw_img = urlopen(img).read()
#add the directory for your image here 
DIR="C:\\Users\\dhvani\\Pictures\\"+query+"\\"
cntr = len([i for i in os.listdir(DIR) if image_type in i]) + 1
print(cntr)
f = open(DIR + image_type + "_"+ str(cntr)+".jpg", 'wb')
f.write(raw_img)
f.close()

Can anybody help me to extract all the images?

Upvotes: 1

Views: 2922

Answers (2)

pixie999
pixie999

Reputation: 498

Google images returns only 20 images, subsequent results are loaded as we scroll. To control which 20 results are returned, you can use the start parameter in the url.

For example, this will print image urls for the number of results you specify

import requests
from bs4 import BeautifulSoup

num_res = 400
for start in range(0, num_res, 20):
    base_url ="https://www.google.co.in/search?q=happiness&source=lnms&tbm=isch&start={}"
    r = requests.get(base_url.format(start))
    soup = BeautifulSoup(r.content, 'lxml')
    print([[res.get('src') for res in child.findAll('img')] for child in soup.html.body.table.children][3])

This answer is just to satiate your curiosity, the ideal way to do this is via google search apis

Upvotes: 1

jvmvik
jvmvik

Reputation: 319

We build a solution to solve Google Image scraping . SerpAPI is a web service to convert google image results into JSON. We provide an extension for all the most popular platform: Python, Ruby, Java, NodeJS etc...

Upvotes: 5

Related Questions