Reputation: 452
I'm using Mechanize in Python to perform some web scraping. Most of the website works but one particular page doesn't return any Content or Response.
My settings are
self._browser = mechanize.Browser()
self._browser.set_handle_refresh(True)
self._browser.set_debug_responses(True)
self._browser.set_debug_redirects(True)
self._browser.set_debug_http(True)
and the code to execute is:
response = self._browser.open(url)
This is the debug output:
add_cookie_header
Checking xyz.com for cookies to return
- checking cookie path=/
- checking cookie <Cookie ASP.NET_SessionId=j3pg0wnavh3yjseyj1v3mr45 for xyz.com/>
it's a match
send: 'GET /page.aspx?leagueID=39 HTTP/1.1\r\nAccept-Encoding: identity\r\nHost: xyz.com\r\nCookie: ASP.NET_SessionId=aapg9wnavh3yqyrtg1v3ar45\r\nConnection: close\r\nUser-Agent: Mozilla/5.0 (Windows NT 6.0) AppleWebKit/535.2 (KHTML, like Gecko) Chrome/15.0.874.121 Safari/535.2\r\n\r\n'
reply: 'HTTP/1.1 200 OK\r\n'
header: Date: Tue, 07 Feb 2012 19:04:37 GMT
header: Pragma: no-cache
header: Expires: -1
header: Connection: close
header: Cache-Control: no-cache
header: Content-Length: 0
extract_cookies: Date: Tue, 07 Feb 2012 19:04:37 GMT
Pragma: no-cache
Expires: -1
Connection: close
Cache-Control: no-cache
Content-Length: 0
I've tried with and without Redirect to no avail. Any ideas?
I might add the page works fine in a browser.
Upvotes: 1
Views: 719
Reputation: 40414
The procedure to find out what's the problem usually is this one:
For the first step, there are many tools available. For example, in firefox, HttpFox and Live HTTP Headers might be quite useful.
For the second step, programmatically logging the headers being sent/received should be enough.
For both steps, you can also capture traffic in your network card with something like wireshark.
Upvotes: 1