Reputation: 7920
I looked here and here for information on my issue, but with no luck.
I made some python code that is intended to grab a webpage's source, as in Safari's Web Inspector. However, I have been getting different code from my application and Safari's Web Inspector. Here is my code so far:
#!/usr/bin/python
import urllib2
# headers
hdr = {'User-Agent': 'Mozilla/5.0 (Macintosh; Intel Mac OS X 10_8_3) AppleWebKit/536.28.10 (KHTML, like Gecko) Version/6.0.3 Safari/536.28.10',
'Accept': 'text/html,application/xhtml+xml,application/xml;q=0.9,*/*;q=0.8',
'Cache-Control': 'max-age=0'}
# request data
req = urllib2.Request("https://www.google.com/#q=rainbow&safe=active", headers=hdr)
# try to get data
try:
page = urllib2.urlopen(req)
print page.info()
except urllib2.HTTPError, e:
print e.fp.read()
content = page.read()
#print content
print content
And the headers match up to what is in Web Inspector:
The code returned is different, though, for a google search for "rainbow".
My python:
http://paste.ubuntu.com/6270549/
Web Inspector:
http://paste.ubuntu.com/6270606/
As far as I know, it seems that my code is missing a large number of the ubiquitous }catch(e){gbar_._DumpException(e)}
lines that are present in the Web Inspector code. Also, my code only has 78 lines, while the Web Inspector code has 235 lines. Does this mean that my code is not getting all of the javascript or some other portion of the webpage? How can I get my code retrieve the same data as the Web Inspector?
Upvotes: 1
Views: 227
Reputation: 7920
You are using the wrong link to search with google search- the correct link should be:
https://www.google.com/search?q=rainbow&safe=active
instead of:
https://www.google.com/#q=rainbow&safe=active
The second link will cause a redirect to Google's homepage when used in python, because it is incorrect (for some reason) when not used in Safari. This is why the code is different.
Upvotes: 1