Reputation: 856

How to download google image search results in Python

This question has been asked numerous times before, but all answers are at least a couple years old and currently based on the ajax.googleapis.com API, which is no longer supported.

Does anyone know of another way? I'm trying to download a hundred or so search results, and in addition to Python APIs I've tried numerous desktop, browser-based, or browser-addon programs for doing this which all failed.

Upvotes: 28

Answers (10)

Andriy Ivaneyko

Reputation: 22061

Use the Google Custom Search for what you want to achieve. See @i08in's answer of Python - Download Images from google Image search? it has great description, script samples and libraries references.

Upvotes: 10

babatunde adewole

Reputation: 493

Make sure you install icrawler library first, use.

pip install icrawler

from icrawler.builtin import GoogleImageCrawler
google_Crawler = GoogleImageCrawler(storage = {'root_dir': r'write the name of the directory you want to save to here'})
google_Crawler.crawl(keyword = 'sad human faces', max_num = 800)

Upvotes: 19

Avin_ash

Reputation: 61

A simple solution to this problem is to install a python package called google_images_download

pip install google_images_download

use this python code

from google_images_download import google_images_download  

response = google_images_download.googleimagesdownload()
keywords = "apple fruit"
arguments = {"keywords":keywords,"limit":20,"print_urls":True}
paths = response.download(arguments)
print(paths)

adjust the limit to control the no of images to download

but some images won't open as they might be corrupt

change the keywords String to get the output you need

Upvotes: 1

Rodrigo Laguna

Reputation: 1850

I'm trying this library that can be used as both: a command line tool or a python library. It has lots of arguments to find images with different criterias.

Those are examples taken from its documentation, to use it as a python library:

from google_images_download import google_images_download   #importing the library

response = google_images_download.googleimagesdownload()   #class instantiation

arguments = {"keywords":"Polar bears,baloons,Beaches","limit":20,"print_urls":True}   #creating list of arguments
paths = response.download(arguments)   #passing the arguments to the function
print(paths)   #printing absolute paths of the downloaded images

or as a commandline tool, as follows:

$ googleimagesdownload --k "car" -sk 'red,blue,white' -l 10

You can install this with pip install google_images_download

Upvotes: 1

hnvasa

Reputation: 860

How about this one?

https://github.com/hardikvasa/google-images-download

it allows you to download hundreds of images and has a ton of filters to choose from to customize your search

If you would want to download more than 100 images per keyword, then you will need to install 'selenium' along with 'chromedriver'.

If you have pip installed the library or run the setup.py file, Selenium would have automatically installed on your machine. You will also need Chrome browser on your machine. For chromedriver:

Download the correct chromedriver based on your operating system.

On Windows or MAC if for some reason the chromedriver gives you trouble, download it under the current directory and run the command.

On windows however, the path to chromedriver has to be given in the following format:

C:\complete\path\to\chromedriver.exe

On Linux if you are having issues installing google chrome browser, refer to this CentOS or Amazon Linux Guide or Ubuntu Guide

For All the operating systems you will have to use '--chromedriver' or '-cd' argument to specify the path of chromedriver that you have downloaded in your machine.

Upvotes: 3

Ravi Hirani

Reputation: 6539

I have tried many codes but none of them working for me. I am posting my working code here. Hope it will help others.

I am using Python version 3.6 and used icrawler

First, you need to download icrawler in your system.

Then run below code.

from icrawler.examples import GoogleImageCrawler
google_crawler = GoogleImageCrawler()
google_crawler.crawl(keyword='krishna', max_num=100)

Replace keyword krishna with your desired text.

Note:- Downloaded image needs path. Right now I used same directory where script placed. You can set custom directory via below code.

google_crawler = GoogleImageCrawler('path_to_your_folder')

Upvotes: 0

Soumya Boral

Reputation: 1349

Improving a bit on Ravi Hirani's answer the simplest way is to go by this :

from icrawler.builtin import GoogleImageCrawler

google_crawler = GoogleImageCrawler(storage={'root_dir': 'D:\\projects\\data core\\helmet detection\\images'})
google_crawler.crawl(keyword='cat', max_num=100)

Source : https://pypi.org/project/icrawler/

Upvotes: 6

atif93

Reputation: 401

To download any number of images from Google image search using Selenium:

from selenium import webdriver
from selenium.webdriver.common.keys import Keys
import os
import json
import urllib2
import sys
import time

# adding path to geckodriver to the OS environment variable
# assuming that it is stored at the same path as this script
os.environ["PATH"] += os.pathsep + os.getcwd()
download_path = "dataset/"

def main():
    searchtext = sys.argv[1] # the search query
    num_requested = int(sys.argv[2]) # number of images to download
    number_of_scrolls = num_requested / 400 + 1 
    # number_of_scrolls * 400 images will be opened in the browser

    if not os.path.exists(download_path + searchtext.replace(" ", "_")):
        os.makedirs(download_path + searchtext.replace(" ", "_"))

    url = "https://www.google.co.in/search?q="+searchtext+"&source=lnms&tbm=isch"
    driver = webdriver.Firefox()
    driver.get(url)

    headers = {}
    headers['User-Agent'] = "Mozilla/5.0 (Windows NT 6.1) AppleWebKit/537.36 (KHTML, like Gecko) Chrome/41.0.2228.0 Safari/537.36"
    extensions = {"jpg", "jpeg", "png", "gif"}
    img_count = 0
    downloaded_img_count = 0

    for _ in xrange(number_of_scrolls):
        for __ in xrange(10):
            # multiple scrolls needed to show all 400 images
            driver.execute_script("window.scrollBy(0, 1000000)")
            time.sleep(0.2)
        # to load next 400 images
        time.sleep(0.5)
        try:
            driver.find_element_by_xpath("//input[@value='Show more results']").click()
        except Exception as e:
            print "Less images found:", e
            break

    # imges = driver.find_elements_by_xpath('//div[@class="rg_meta"]') # not working anymore
    imges = driver.find_elements_by_xpath('//div[contains(@class,"rg_meta")]')
    print "Total images:", len(imges), "\n"
    for img in imges:
        img_count += 1
        img_url = json.loads(img.get_attribute('innerHTML'))["ou"]
        img_type = json.loads(img.get_attribute('innerHTML'))["ity"]
        print "Downloading image", img_count, ": ", img_url
        try:
            if img_type not in extensions:
                img_type = "jpg"
            req = urllib2.Request(img_url, headers=headers)
            raw_img = urllib2.urlopen(req).read()
            f = open(download_path+searchtext.replace(" ", "_")+"/"+str(downloaded_img_count)+"."+img_type, "wb")
            f.write(raw_img)
            f.close
            downloaded_img_count += 1
        except Exception as e:
            print "Download failed:", e
        finally:
            print
        if downloaded_img_count >= num_requested:
            break

    print "Total downloaded: ", downloaded_img_count, "/", img_count
    driver.quit()

if __name__ == "__main__":
    main()

Full code is here.

Upvotes: 7

rishabhr0y

Reputation: 868

i have been using this script to download images from google search and i have been using them for my trainig my classifiers the code below can download 100 images related to the query

from bs4 import BeautifulSoup
import requests
import re
import urllib2
import os
import cookielib
import json

def get_soup(url,header):
    return BeautifulSoup(urllib2.urlopen(urllib2.Request(url,headers=header)),'html.parser')


query = raw_input("query image")# you can change the query for the image  here
image_type="ActiOn"
query= query.split()
query='+'.join(query)
url="https://www.google.co.in/search?q="+query+"&source=lnms&tbm=isch"
print url
#add the directory for your image here
DIR="Pictures"
header={'User-Agent':"Mozilla/5.0 (Windows NT 6.1; WOW64) AppleWebKit/537.36 (KHTML, like Gecko) Chrome/43.0.2357.134 Safari/537.36"
}
soup = get_soup(url,header)


ActualImages=[]# contains the link for Large original images, type of  image
for a in soup.find_all("div",{"class":"rg_meta"}):
    link , Type =json.loads(a.text)["ou"]  ,json.loads(a.text)["ity"]
    ActualImages.append((link,Type))

print  "there are total" , len(ActualImages),"images"

if not os.path.exists(DIR):
            os.mkdir(DIR)
DIR = os.path.join(DIR, query.split()[0])

if not os.path.exists(DIR):
            os.mkdir(DIR)
###print images
for i , (img , Type) in enumerate( ActualImages):
    try:
        req = urllib2.Request(img, headers={'User-Agent' : header})
        raw_img = urllib2.urlopen(req).read()

        cntr = len([i for i in os.listdir(DIR) if image_type in i]) + 1
        print cntr
        if len(Type)==0:
            f = open(os.path.join(DIR , image_type + "_"+ str(cntr)+".jpg"), 'wb')
        else :
            f = open(os.path.join(DIR , image_type + "_"+ str(cntr)+"."+Type), 'wb')


        f.write(raw_img)
        f.close()
    except Exception as e:
        print "could not load : "+img
        print e

Upvotes: 2

Lincoln Lorscheider

Reputation: 142

You need to use the custom search API. There is a handy explorer here. I use urllib2. You also need to create an API key for your application from the developer console.

Upvotes: 0

How to download google image search results in Python

Answers (10)

Related Questions