Reputation: 856
This question has been asked numerous times before, but all answers are at least a couple years old and currently based on the ajax.googleapis.com API, which is no longer supported.
Does anyone know of another way? I'm trying to download a hundred or so search results, and in addition to Python APIs I've tried numerous desktop, browser-based, or browser-addon programs for doing this which all failed.
Upvotes: 28
Views: 62865
Reputation: 22021
Use the Google Custom Search for what you want to achieve. See @i08in's answer of Python - Download Images from google Image search? it has great description, script samples and libraries references.
Upvotes: 10
Reputation: 493
Make sure you install icrawler library first, use.
pip install icrawler
from icrawler.builtin import GoogleImageCrawler
google_Crawler = GoogleImageCrawler(storage = {'root_dir': r'write the name of the directory you want to save to here'})
google_Crawler.crawl(keyword = 'sad human faces', max_num = 800)
Upvotes: 19
Reputation: 61
A simple solution to this problem is to install a python package called google_images_download
pip install google_images_download
use this python code
from google_images_download import google_images_download
response = google_images_download.googleimagesdownload()
keywords = "apple fruit"
arguments = {"keywords":keywords,"limit":20,"print_urls":True}
paths = response.download(arguments)
print(paths)
adjust the limit to control the no of images to download
but some images won't open as they might be corrupt
change the keywords
String to get the output you need
Upvotes: 1
Reputation: 1850
I'm trying this library that can be used as both: a command line tool or a python library. It has lots of arguments to find images with different criterias.
Those are examples taken from its documentation, to use it as a python library:
from google_images_download import google_images_download #importing the library
response = google_images_download.googleimagesdownload() #class instantiation
arguments = {"keywords":"Polar bears,baloons,Beaches","limit":20,"print_urls":True} #creating list of arguments
paths = response.download(arguments) #passing the arguments to the function
print(paths) #printing absolute paths of the downloaded images
or as a commandline tool, as follows:
$ googleimagesdownload --k "car" -sk 'red,blue,white' -l 10
You can install this with pip install google_images_download
Upvotes: 1
Reputation: 860
How about this one?
https://github.com/hardikvasa/google-images-download
it allows you to download hundreds of images and has a ton of filters to choose from to customize your search
If you would want to download more than 100 images per keyword, then you will need to install 'selenium' along with 'chromedriver'.
If you have pip installed the library or run the setup.py file, Selenium would have automatically installed on your machine. You will also need Chrome browser on your machine. For chromedriver:
Download the correct chromedriver based on your operating system.
On Windows or MAC if for some reason the chromedriver gives you trouble, download it under the current directory and run the command.
On windows however, the path to chromedriver has to be given in the following format:
C:\complete\path\to\chromedriver.exe
On Linux if you are having issues installing google chrome browser, refer to this CentOS or Amazon Linux Guide or Ubuntu Guide
For All the operating systems you will have to use '--chromedriver' or '-cd' argument to specify the path of chromedriver that you have downloaded in your machine.
Upvotes: 3
Reputation: 6539
I have tried many codes but none of them working for me. I am posting my working code here. Hope it will help others.
I am using Python version 3.6 and used icrawler
First, you need to download icrawler in your system.
Then run below code.
from icrawler.examples import GoogleImageCrawler
google_crawler = GoogleImageCrawler()
google_crawler.crawl(keyword='krishna', max_num=100)
Replace keyword
krishna
with your desired text.
Note:- Downloaded image needs path. Right now I used same directory where script placed. You can set custom directory via below code.
google_crawler = GoogleImageCrawler('path_to_your_folder')
Upvotes: 0
Reputation: 1339
Improving a bit on Ravi Hirani's answer the simplest way is to go by this :
from icrawler.builtin import GoogleImageCrawler
google_crawler = GoogleImageCrawler(storage={'root_dir': 'D:\\projects\\data core\\helmet detection\\images'})
google_crawler.crawl(keyword='cat', max_num=100)
Source : https://pypi.org/project/icrawler/
Upvotes: 6
Reputation: 401
To download any number of images from Google image search using Selenium:
from selenium import webdriver
from selenium.webdriver.common.keys import Keys
import os
import json
import urllib2
import sys
import time
# adding path to geckodriver to the OS environment variable
# assuming that it is stored at the same path as this script
os.environ["PATH"] += os.pathsep + os.getcwd()
download_path = "dataset/"
def main():
searchtext = sys.argv[1] # the search query
num_requested = int(sys.argv[2]) # number of images to download
number_of_scrolls = num_requested / 400 + 1
# number_of_scrolls * 400 images will be opened in the browser
if not os.path.exists(download_path + searchtext.replace(" ", "_")):
os.makedirs(download_path + searchtext.replace(" ", "_"))
url = "https://www.google.co.in/search?q="+searchtext+"&source=lnms&tbm=isch"
driver = webdriver.Firefox()
driver.get(url)
headers = {}
headers['User-Agent'] = "Mozilla/5.0 (Windows NT 6.1) AppleWebKit/537.36 (KHTML, like Gecko) Chrome/41.0.2228.0 Safari/537.36"
extensions = {"jpg", "jpeg", "png", "gif"}
img_count = 0
downloaded_img_count = 0
for _ in xrange(number_of_scrolls):
for __ in xrange(10):
# multiple scrolls needed to show all 400 images
driver.execute_script("window.scrollBy(0, 1000000)")
time.sleep(0.2)
# to load next 400 images
time.sleep(0.5)
try:
driver.find_element_by_xpath("//input[@value='Show more results']").click()
except Exception as e:
print "Less images found:", e
break
# imges = driver.find_elements_by_xpath('//div[@class="rg_meta"]') # not working anymore
imges = driver.find_elements_by_xpath('//div[contains(@class,"rg_meta")]')
print "Total images:", len(imges), "\n"
for img in imges:
img_count += 1
img_url = json.loads(img.get_attribute('innerHTML'))["ou"]
img_type = json.loads(img.get_attribute('innerHTML'))["ity"]
print "Downloading image", img_count, ": ", img_url
try:
if img_type not in extensions:
img_type = "jpg"
req = urllib2.Request(img_url, headers=headers)
raw_img = urllib2.urlopen(req).read()
f = open(download_path+searchtext.replace(" ", "_")+"/"+str(downloaded_img_count)+"."+img_type, "wb")
f.write(raw_img)
f.close
downloaded_img_count += 1
except Exception as e:
print "Download failed:", e
finally:
print
if downloaded_img_count >= num_requested:
break
print "Total downloaded: ", downloaded_img_count, "/", img_count
driver.quit()
if __name__ == "__main__":
main()
Full code is here.
Upvotes: 7
Reputation: 868
i have been using this script to download images from google search and i have been using them for my trainig my classifiers the code below can download 100 images related to the query
from bs4 import BeautifulSoup
import requests
import re
import urllib2
import os
import cookielib
import json
def get_soup(url,header):
return BeautifulSoup(urllib2.urlopen(urllib2.Request(url,headers=header)),'html.parser')
query = raw_input("query image")# you can change the query for the image here
image_type="ActiOn"
query= query.split()
query='+'.join(query)
url="https://www.google.co.in/search?q="+query+"&source=lnms&tbm=isch"
print url
#add the directory for your image here
DIR="Pictures"
header={'User-Agent':"Mozilla/5.0 (Windows NT 6.1; WOW64) AppleWebKit/537.36 (KHTML, like Gecko) Chrome/43.0.2357.134 Safari/537.36"
}
soup = get_soup(url,header)
ActualImages=[]# contains the link for Large original images, type of image
for a in soup.find_all("div",{"class":"rg_meta"}):
link , Type =json.loads(a.text)["ou"] ,json.loads(a.text)["ity"]
ActualImages.append((link,Type))
print "there are total" , len(ActualImages),"images"
if not os.path.exists(DIR):
os.mkdir(DIR)
DIR = os.path.join(DIR, query.split()[0])
if not os.path.exists(DIR):
os.mkdir(DIR)
###print images
for i , (img , Type) in enumerate( ActualImages):
try:
req = urllib2.Request(img, headers={'User-Agent' : header})
raw_img = urllib2.urlopen(req).read()
cntr = len([i for i in os.listdir(DIR) if image_type in i]) + 1
print cntr
if len(Type)==0:
f = open(os.path.join(DIR , image_type + "_"+ str(cntr)+".jpg"), 'wb')
else :
f = open(os.path.join(DIR , image_type + "_"+ str(cntr)+"."+Type), 'wb')
f.write(raw_img)
f.close()
except Exception as e:
print "could not load : "+img
print e
Upvotes: 2
Reputation: 142
You need to use the custom search API. There is a handy explorer here. I use urllib2. You also need to create an API key for your application from the developer console.
Upvotes: 0