Reputation:
I am unsure why hosting this simple code on Google AppEngine returns a server error when any query is submitted to the form. The problem seems to be with the line html = urllib2.urlopen("http://google.com/search?q=" + q).read() as the code works fine without it.
import webapp2
import urllib2
form="""
<form action="/process">
<input name="q">
<input type="submit">
</form>
"""
class MainHandler(webapp2.RequestHandler):
def get(self):
self.response.out.write(form)
class ProcessHandler(webapp2.RequestHandler):
def get(self):
q = self.request.get("q")
html = urllib2.urlopen("http://google.com/search?q=" + q).read()
self.response.out.write(html)
app = webapp2.WSGIApplication([('/', MainHandler),
('/process', ProcessHandler)],
debug=True)
This is the error returned:
Error: Server Error
The server encountered an error and could not complete your request.
If the problem persists, please report your problem and mention this error message and the query that caused it.
Upvotes: 1
Views: 424
Reputation: 1711
Google is returning a 403 to your search string
>>> import urllib2
>>> html = urllib2.urlopen("http://google.com/search?q=Test").read()
Traceback (most recent call last):
File "<stdin>", line 1, in <module>
File "/usr/lib/python2.7/urllib2.py", line 127, in urlopen
return _opener.open(url, data, timeout)
File "/usr/lib/python2.7/urllib2.py", line 410, in open
response = meth(req, response)
File "/usr/lib/python2.7/urllib2.py", line 523, in http_response
'http', request, response, code, msg, hdrs)
File "/usr/lib/python2.7/urllib2.py", line 442, in error
result = self._call_chain(*args)
File "/usr/lib/python2.7/urllib2.py", line 382, in _call_chain
result = func(*args)
File "/usr/lib/python2.7/urllib2.py", line 629, in http_error_302
return self.parent.open(new, timeout=req.timeout)
File "/usr/lib/python2.7/urllib2.py", line 410, in open
response = meth(req, response)
File "/usr/lib/python2.7/urllib2.py", line 523, in http_response
'http', request, response, code, msg, hdrs)
File "/usr/lib/python2.7/urllib2.py", line 448, in error
return self._call_chain(*args)
File "/usr/lib/python2.7/urllib2.py", line 382, in _call_chain
result = func(*args)
File "/usr/lib/python2.7/urllib2.py", line 531, in http_error_default
raise HTTPError(req.get_full_url(), code, msg, hdrs, fp)
urllib2.HTTPError: HTTP Error 403: Forbidden
This works however:
html = urllib2.urlopen("http://google.com").read()
So it looks like google are trying to stop this kind of searching. As the other poster suggested, changing the User Agent string might stop the 403. Pick something common!
I've just tested with a Mozilla user agent set and I can get the results I think you are looking for
import urllib2
headers = { 'User-Agent' : 'Mozilla/5.0' }
req = urllib2.Request('http://google.com/search?q=Test', None, headers)
html = urllib2.urlopen(req).read()
print html
Upvotes: 0
Reputation: 506
Probably www.google.com doesn't accept this kind of direct connections, canceling connections from a particular user agent. In a simple python environment, you could change the user-agent string, but I think it's not possible to do that through google app engine.
Upvotes: 1