Reputation: 81
I am trying to make a script that gets similar images from google using a url, using a part from this code.
The problem is, that I want to get to this link, because from it I can get to the images themselves by cloicking on the "search by image" link, but when I use the script, I get the exact same page, but without the "search by image" link.
I would like to know why and if there is a way to fix it.
Thanks a lot in advance!
P.S. Here's the code
import os
from urllib2 import Request, urlopen
from cookielib import LWPCookieJar
USER_AGENT = r"Mozilla/4.0 (compatible; MSIE 8.0; Windows NT 6.0)"
LOCAL_PATH = r"C:\scripts\google_search"
COOKIE_JAR_FILE = r".google-cookie"
class google_search(object):
def cleanup(self):
if os.path.isfile(self.cookie_jar_path):
os.remove(self.cookie_jar_path)
os.chdir(LOCAL_PATH)
for html in os.listdir("."):
if html.endswith(".html"):
os.remove(html)
def __init__(self, cookie_jar_path):
self.cookie_jar_path = cookie_jar_path
self.cookie_jar = LWPCookieJar(self.cookie_jar_path)
self.counter = 0
self.cleanup()
try:
cookie.load()
except Exception:
pass
def get_html(self, url):
request = Request(url = url)
request.add_header("User-Agent", USER_AGENT)
self.cookie_jar.add_cookie_header(request)
response = urlopen(request)
self.cookie_jar.extract_cookies(response, request)
html_response = response.read()
response.close()
self.cookie_jar.save()
return html_response
def main():
url_2 = r"http://www.google.com/search?hl=en&q=http%3A%2F%2Fi.imgur.com%2FqGRxTNA.jpg&btnG=Google+Search"
search = google_search(os.path.join(LOCAL_PATH, COOKIE_JAR_FILE))
html_2 = search.get_html(url_2)
if __name__ == '__main__':
main()
Upvotes: 1
Views: 87
Reputation: 44444
I have tried something of that sort a few weeks back. My server used to reject my requests with a 404 because I was not setting a proper user agent.
In your case, you are not setting the user agent properly. Pasting my User-Agent header.
USER_AGENT = r"Mozilla/5.0 (Macintosh; Intel Mac OS X 10_8_5) AppleWebKit/537.36 (KHTML, like Gecko) Chrome/30.0.1599.101 Safari/537.36"
PS: I hope you have read the T & C of Google. You might be violating the terms.
Upvotes: 1