Reputation: 45
I wrote python code to search for an image in google with some google dork keywords. Here is the code:
def showD(self):
self.text, ok = QInputDialog.getText(self, 'Write A Keyword', 'Example:"twitter.com"')
if ok == True:
self.google()
def google(self):
filePath = self.imagePath
domain = self.text
searchUrl = 'http://www.google.com/searchbyimage/upload'
multipart = {'encoded_image': (filePath, open(filePath, 'rb')), 'image_content': '', 'q': f'site:{domain}'}
response = requests.post(searchUrl, files=multipart, allow_redirects=False)
fetchUrl = response.headers['Location']
webbrowser.open(fetchUrl)
App = QApplication(sys.argv)
window = Window()
sys.exit(App.exec())
I just didn't figure how to display the url of the search result in my program. I tried this code:
import requests
from bs4 import BeautifulSoup
import re
query = "twitter"
search = query.replace(' ', '+')
results = 15
url = (f"https://www.google.com/search?q={search}&num={results}")
requests_results = requests.get(url)
soup_link = BeautifulSoup(requests_results.content, "html.parser")
links = soup_link.find_all("a")
for link in links:
link_href = link.get('href')
if "url?q=" in link_href and not "webcache" in link_href:
title = link.find_all('h3')
if len(title) > 0:
print(link.get('href').split("?q=")[1].split("&sa=U")[0])
# print(title[0].getText())
print("------")
But it only works for normal google search keyword and failed when I try to optimize it for the result of google image search. It didn't display any result.
Upvotes: 2
Views: 859
Reputation: 3860
Currently there is no simple way to scrape Google's "Search by image" using plain HTTPS requests. Before responding to this type of request, they presumably check if user is real using several sophisticated techniques. Even your working example of code does not work for long — it happens to be banned by Google after 20-100 requests.
All public solutions in Python that really scrape Google with images use Selenium and imitate the real user behaviour. So you can go this way yourself. Interfaces of python-selenium binding are not so tough to get used to, except maybe the setup process.
The best of them, for my taste, is hardikvasa/google-images-download (7.8K stars on Github). Unfortunately, this library has no such input interface as image path or image in binary format. It only has the similar_images
parameter which expects a URL. Nevertheless, you can try to use it with http://localhost:1234/...
URL (you can easily set one up this way).
You can check all these questions and see that all the solutions use Selenium for this task.
Upvotes: 2