Reputation: 1128
I have the following code snippet, slightly modified from the original docs. The code works properly when url
is set to http://google.com. But it crashes when this is changed to http://www.google.com. The error upon crashing is Failure: twisted.web.client.PartialDownloadError: 200 OK
. The traceback is below the code snippet.
Initially I thought that perhaps the code was crashing due to not handling SSL properly. But, looking at the headers this doesn't appear to be the issue. This is my first time ever working with Twisted; I don't know what else could be causing the problem.
Code
from sys import argv
from pprint import pformat
from twisted.internet.task import react
from twisted.web.client import Agent, BrowserLikeRedirectAgent, readBody
from twisted.web.http_headers import Headers
from twisted.internet import reactor
from twisted.internet.ssl import ClientContextFactory
responses = []
class WebClientContextFactory(ClientContextFactory):
def getContext(self, hostname, port):
return ClientContextFactory.getContext(self)
def cbBody(r):
print 'Response body:'
print r
responses.append(r)
def cbRequest(response):
print 'Response version:', response.version
print 'Response code:', response.code
print 'Response phrase:', response.phrase
print 'Response headers:'
print pformat(list(response.headers.getAllRawHeaders()))
d = readBody(response)
d.addCallback(cbBody)
return d
def main(reactor):
contextFactory = WebClientContextFactory()
agent = BrowserLikeRedirectAgent(Agent(reactor, contextFactory))
url=b"http://google.com/"
agent = Agent(reactor, contextFactory)
d = agent.request(
'GET', url,
Headers({'User-Agent': ['Twisted Web Client Example']}),
None)
d.addCallback(cbRequest)
return d
react(main)
Traceback
In [1]: %tb
---------------------------------------------------------------------------
SystemExit Traceback (most recent call last)
/usr/local/lib/python2.7/site-packages/IPython/utils/py3compat.pyc in execfile(fname, glob, loc, compiler)
218 else:
219 scripttext = builtin_mod.open(fname).read().rstrip() + '\n'
--> 220 exec(compiler(scripttext, filename, 'exec'), glob, loc)
221
222
/project/demo.py in <module>()
42 return d
43
---> 44 react(main)
/usr/local/lib/python2.7/site-packages/twisted/internet/task.pyc in react(main, argv, _reactor)
902 finished.addBoth(cbFinish)
903 _reactor.run()
--> 904 sys.exit(codes[0])
905
906
SystemExit: 1
Upvotes: 2
Views: 792
Reputation: 48345
It shouldn't be too surprising that requests for different URLs produce different responses. The URLs identify different resources. You should probably expect to get different responses when requesting different resources.
The reason you get a PartialDownloadError
when you request http://www.google.com/
is that Google is sending a response with neither a Content-Length
nor Transfer-Encoding: chunked
in it. This means the only way for the client to know when the response has been received is when the TCP connection is closed. Unfortunately, TCP connections can close for other reasons - so it is ambiguous whether a response is ever fully received.
Google seems to be framing the response this way in response to the particular details of how Agent
issues the request. Google responds with Transfer-Encoding: chunked
to requests made by other agents.
One option to address this is to decide you don't care if responses are truncated without your knowledge. In this case, add an errback to the readBody
Deferred
that handles PartialDownloadError
. The exception has a response
attribute giving you the data that was read up until the TCP connection closed. Grab that data and return it and now you've converted the maybe-failed case into a who-cares-pretend-it-succeeded case.
Another option is to try fiddling with the details of the request until you convince Google to give you a Transfer-Encoding: chunked
(or at least a Content-Length
). Of course, this solution breaks as soon as you meet another server that doesn't feel like giving you one or the other of these.
Upvotes: 3