Dan Tao
Dan Tao

Reputation: 128307

How can I trigger an IncompleteRead (on purpose) in a Python web application?

I've got some Python code that makes requests using the requests library and occasionally experiences an IncompleteRead error. I'm trying to update this code to handle this error more gracefully and would like to test that it works, so I'm wondering how to actually trigger the conditions under which IncompleteRead is raised.

I realize I can do some mocking in a unit test; I'd just like to actually reproduce the circumstances (if I can) under which this error was previously occurring and ensure my code is able to deal with it properly.

Upvotes: 1

Views: 1309

Answers (3)

Mark Amery
Mark Amery

Reputation: 154715

By looking at the places where raise IncompleteRead appears at https://github.com/python/cpython/blob/v3.8.0/Lib/http/client.py, I think the standard library's http.client module (named httplib back in Python 2) raises this exception in only the following two circumstances:

  • When a response's body is shorter than claimed by the response's Content-Length header, or
  • When a chunked response claims that the next chunk is of length n, but there are fewer than n bytes remaining in the response body.

If you install Flask (pip install Flask), you can paste this into a file to create a test server you can run with endpoints that artificially create both of these circumstances:

from flask import Flask, make_response

app = Flask(__name__)

@app.route('/test')
def send_incomplete_response():
    response = make_response('fourteen chars')
    response.headers['Content-Length'] = '10000'
    return response

@app.route('/test_chunked')
def send_chunked_response_with_wrong_sizes():
    # Example response based on
    # https://developer.mozilla.org/en-US/docs/Web/HTTP/Headers/Transfer-Encoding
    # but with the stated size of the second chunk increased to 900
    resp_text = """7\r\nMozilla\r\n900\r\nDeveloper\r\n7\r\nNetwork\r\n0\r\n\r\n"""
    response = make_response(resp_text)
    response.headers['Transfer-Encoding'] = 'chunked'
    return response

app.run()

and then test them with http.client:

>>> import http.client
>>> 
>>> conn = http.client.HTTPConnection('localhost', 5000)
>>> conn.request('GET', '/test')
>>> response = conn.getresponse()
>>> response.read()
Traceback (most recent call last):
  File "<stdin>", line 1, in <module>
  File "/usr/lib/python3.8/http/client.py", line 467, in read
    s = self._safe_read(self.length)
  File "/usr/lib/python3.8/http/client.py", line 610, in _safe_read
    raise IncompleteRead(data, amt-len(data))
http.client.IncompleteRead: IncompleteRead(14 bytes read, 9986 more expected)
>>> 
>>> conn = http.client.HTTPConnection('localhost', 5000)
>>> conn.request('GET', '/test_chunked')
>>> response = conn.getresponse()
>>> response.read()
Traceback (most recent call last):
  File "/usr/lib/python3.8/http/client.py", line 571, in _readall_chunked
    value.append(self._safe_read(chunk_left))
  File "/usr/lib/python3.8/http/client.py", line 610, in _safe_read
    raise IncompleteRead(data, amt-len(data))
http.client.IncompleteRead: IncompleteRead(28 bytes read, 2276 more expected)

During handling of the above exception, another exception occurred:

Traceback (most recent call last):
  File "<stdin>", line 1, in <module>
  File "/usr/lib/python3.8/http/client.py", line 461, in read
    return self._readall_chunked()
  File "/usr/lib/python3.8/http/client.py", line 575, in _readall_chunked
    raise IncompleteRead(b''.join(value))
http.client.IncompleteRead: IncompleteRead(7 bytes read)

In real life, the most likely reason this might happen sporadically is if a connection was closed early by the server. For example, you can also try running this Flask server, which sends a response body very slowly, with a total of 20 seconds of sleeping:

from flask import Flask, make_response, Response
from time import sleep

app = Flask(__name__)

@app.route('/test_generator')
def send_response_with_delays():
    def generate():
        yield 'foo'
        sleep(10)
        yield 'bar'
        sleep(10)
        yield 'baz'

    response = Response(generate())
    response.headers['Content-Length'] = '9'
    return response

app.run()

If you run that server in a terminal, then initiate a request to it and start reading the response like this...

>>> import http.client
>>> conn = http.client.HTTPConnection('localhost', 5000)
>>> conn.request('GET', '/test_generator')
>>> response = conn.getresponse()
>>> response.read()

... and then flick back to the terminal running your server and kill it (e.g. with CTRL-C, on Unix), then you'll see your .read() call error out with a familiar message:

Traceback (most recent call last):
  File "<stdin>", line 1, in <module>
  File "/usr/lib/python3.8/http/client.py", line 467, in read
    s = self._safe_read(self.length)
  File "/usr/lib/python3.8/http/client.py", line 610, in _safe_read
    raise IncompleteRead(data, amt-len(data))
http.client.IncompleteRead: IncompleteRead(6 bytes read, 3 more expected)

Other, less probable causes include your server systematically generating an incorrect Content-Length header (maybe due to some broken handling of Unicode), or your Content-Length header (or the lengths included in a chunked message) being corrupted in transit.

Okay, that covers the standard library. What about Requests? Requests by default defers its work to urllib3 which in turn defers to http.client, so you might expect the exception from http.client to simply bubble up when using Requests. However, life is more complicated than that, for two reasons:

  1. Both urllib3 and requests catch exceptions in the layer beneath them and raise their own versions. For instance, there are urllib3.exceptions.IncompleteRead and requests.exceptions.ChunkedEncodingError.

  2. The current handling of Content-Length checking across all three of these modules is horribly broken, and has been for years. I've done my best to explain it in detail at https://github.com/psf/requests/issues/4956#issuecomment-573325001 if you're interested, but the short version is that http.client won't check Content-Length if you call .read(123) instead of just .read(), that urllib3 may or may not check depending upon various complicated details of how you call it, and that Requests - as a consequence of the previous two issues - currently doesn't check it at all, ever. However, this hasn't always been the case; there have been some attempts to fix it made and unmade, so perhaps at some point in the past - like when this question was asked in 2016 - the state of play was a bit different. Oh, and for extra confusion, while urllib3 has its own version it still sometimes lets the standard library's IncompleteRead exception bubble up, just to mess with you.

Hopefully, point 2 will get fixed in time - I'm having a go right now at nudging it in that direction. Point 1 will remain a complication, but the conditions that trigger these exceptions - whether the underlying http.client.IncompleteRead or the urllib3 or requests alternatives - should remain as I describe at the start of this answer.

Upvotes: 1

salezica
salezica

Reputation: 76929

Adding a second answer, more to the point this time. I took a dive into some source code, and found information that may help

The IncompleteRead exception bubbles up from httplib, part of the python standard library. Most likely, it comes from this function:

def _safe_read(self, amt):
    """
    Read the number of bytes requested, compensating for partial reads.
    Normally, we have a blocking socket, but a read() can be interrupted
    by a signal (resulting in a partial read).

    Note that we cannot distinguish between EOF and an interrupt when zero
    bytes have been read. IncompleteRead() will be raised in this
    situation.

    This function should be used when <amt> bytes "should" be present for
    reading. If the bytes are truly not available (due to EOF), then the
    IncompleteRead exception can be used to detect the problem.
    """

So, either the socket was closed before the HTTP response was consumed, or the reader tried to get too many bytes out of it. Judging by search results (so take this with a grain of salt), there is no other arcane situation that can make this happen.

The first scenario can be debugged with strace. If I'm reading this correctly, the 2nd scenario can be caused by the requests module, if:

  • A Content-Length header is present that exceeds the actual amount of data sent by the server.
  • A chunked response is incorrectly assembled (has an erroneous length byte before one of the chunks), or a regular response is being interpreted as chunked.

This function raises the Exception:

def _update_chunk_length(self):
    # First, we'll figure out length of a chunk and then
    # we'll try to read it from socket.
    if self.chunk_left is not None:
        return
    line = self._fp.fp.readline()
    line = line.split(b';', 1)[0]
    try:
        self.chunk_left = int(line, 16)
    except ValueError:
        # Invalid chunked protocol response, abort.
        self.close()
        raise httplib.IncompleteRead(line)

Try checking the Content-Length header of your buffered responses, or the chunk format of your chunked responses.

To produce the error, try:

  • Forcing an invalid Content-Length
  • Using the chunked response protocol, with a too-large length byte at the beginning of a chunk
  • Closing the socket mid-response

Upvotes: 1

salezica
salezica

Reputation: 76929

When testing code that relies on external behavior (such as server responses, system sensors, etc) the usual approach is to fake the external factors instead of working to produce them.

Create a test version of the function or class you're using to make HTTP requests. If you're using requests directly across your codebase, stop: direct coupling with libraries and external services is very hard to test.

You mention that you want to make sure your code can handle this exception, and you'd rather avoid mocking for this reason. Mocking is just as safe, as long as you're wrapping the modules you need to mock all across your codebase. If you can't mock to test, you're missing layers in your design (or asking too much of your testing suite).

So, for example:

class FooService(object):
    def make_request(*args):
        # use requests.py to perform HTTP requests
        # NOBODY uses requests.py directly without passing through here

class MockFooService(FooService):
    def make_request(*args):
        raise IncompleteRead()

The 2nd class is a testing utility written solely for the purpose of testing this specific case. As your tests grow in coverage and completeness, you may need more sophisticated language (to avoid incessant subclassing and repetition), but it's usually good to start with the simplest code that will read easily and test the desired cases.

Upvotes: 0

Related Questions