Twisted giving twisted.web.client.PartialDownloadError: 200 OK

Question

I have the following code snippet, slightly modified from the original docs. The code works properly when url is set to http://google.com. But it crashes when this is changed to http://www.google.com. The error upon crashing is Failure: twisted.web.client.PartialDownloadError: 200 OK. The traceback is below the code snippet.

Initially I thought that perhaps the code was crashing due to not handling SSL properly. But, looking at the headers this doesn't appear to be the issue. This is my first time ever working with Twisted; I don't know what else could be causing the problem.

Code

from sys import argv
from pprint import pformat
from twisted.internet.task import react
from twisted.web.client import Agent, BrowserLikeRedirectAgent, readBody
from twisted.web.http_headers import Headers
from twisted.internet import reactor
from twisted.internet.ssl import ClientContextFactory

responses = []

class WebClientContextFactory(ClientContextFactory):
    def getContext(self, hostname, port):
        return ClientContextFactory.getContext(self)

def cbBody(r):
    print 'Response body:'
    print r
    responses.append(r)

def cbRequest(response):
    print 'Response version:', response.version
    print 'Response code:', response.code
    print 'Response phrase:', response.phrase
    print 'Response headers:'
    print pformat(list(response.headers.getAllRawHeaders()))
    d = readBody(response)
    d.addCallback(cbBody)
    return d

def main(reactor):
    contextFactory = WebClientContextFactory()
    agent = BrowserLikeRedirectAgent(Agent(reactor, contextFactory))
    url=b"http://google.com/"
    agent = Agent(reactor, contextFactory)
    d = agent.request(
        'GET', url,
        Headers({'User-Agent': ['Twisted Web Client Example']}),
        None)
    d.addCallback(cbRequest)
    return d

react(main)

Traceback

In [1]: %tb
---------------------------------------------------------------------------
SystemExit                                Traceback (most recent call last)
/usr/local/lib/python2.7/site-packages/IPython/utils/py3compat.pyc in execfile(fname, glob, loc, compiler)
    218             else:
    219                 scripttext = builtin_mod.open(fname).read().rstrip() + '
'
--> 220                 exec(compiler(scripttext, filename, 'exec'), glob, loc)
    221
    222

/project/demo.py in ()
     42     return d
     43
---> 44 react(main)

/usr/local/lib/python2.7/site-packages/twisted/internet/task.pyc in react(main, argv, _reactor)
    902     finished.addBoth(cbFinish)
    903     _reactor.run()
--> 904     sys.exit(codes[0])
    905
    906

SystemExit: 1

Jean-Paul Calderone · Accepted Answer

It shouldn't be too surprising that requests for different URLs produce different responses. The URLs identify different resources. You should probably expect to get different responses when requesting different resources.

The reason you get a PartialDownloadError when you request http://www.google.com/ is that Google is sending a response with neither a Content-Length nor Transfer-Encoding: chunked in it. This means the only way for the client to know when the response has been received is when the TCP connection is closed. Unfortunately, TCP connections can close for other reasons - so it is ambiguous whether a response is ever fully received.

Google seems to be framing the response this way in response to the particular details of how Agent issues the request. Google responds with Transfer-Encoding: chunked to requests made by other agents.

One option to address this is to decide you don't care if responses are truncated without your knowledge. In this case, add an errback to the readBody Deferred that handles PartialDownloadError. The exception has a response attribute giving you the data that was read up until the TCP connection closed. Grab that data and return it and now you've converted the maybe-failed case into a who-cares-pretend-it-succeeded case.

Another option is to try fiddling with the details of the request until you convince Google to give you a Transfer-Encoding: chunked (or at least a Content-Length). Of course, this solution breaks as soon as you meet another server that doesn't feel like giving you one or the other of these.

Twisted giving twisted.web.client.PartialDownloadError: 200 OK

Answers (1)

Related Questions