Carcigenicate
Carcigenicate

Reputation: 45806

Do AsyncIO stream writers/readers require manually ensuring that all data is sent/received?

When dealing with sockets, you need to make sure that all data is sent/received, since you may receive incomplete chunks of data when reading. From the docs:

In general, they return when the associated network buffers have been filled (send) or emptied (recv). They then tell you how many bytes they handled. It is your responsibility to call them again until your message has been completely dealt with.

Emphasis mine. It then shows sample implementations that ensure all data has been handled in each direction.


Is the same true though when dealing with AsyncIO wrappers over sockets?

For read, it seems to be required as the docs mention that it "[reads] up to n bytes.".

For write though, it seems like as long as you call drain afterwards, you know that it's all sent. The docs don't explicitly say that it must be called repeatedly, and write doesn't return anything.

Is this correct? Do I need to check how much was read using read, but can just drain the StreamWriter and know that everything was sent?

I thought that my above assumptions were correct, then I had a look at the example TCP Client immediately below the method docs:

import asyncio

async def tcp_echo_client(message):
    reader, writer = await asyncio.open_connection(
        '127.0.0.1', 8888)

    print(f'Send: {message!r}')
    writer.write(message.encode())

    data = await reader.read(100)
    print(f'Received: {data.decode()!r}')

    print('Close the connection')
    writer.close()

asyncio.run(tcp_echo_client('Hello World!'))

And it doesn't do any kind of checking. It assumes everything is both read and written the first time.

Upvotes: 4

Views: 1721

Answers (1)

user4815162342
user4815162342

Reputation: 155226

For read, [checking for incomplete read] seems to be required as the docs mention that it "[reads] up to n bytes.".

Correct, and this is a useful feature for many kinds of processing, as it allows you to read new data as it arrives from the peer and process it incrementally, without having to know how much to expect at any point. If you do know exactly how much you expect and need to read that amount of bytes, you can use readexactly.

For write though, it seems like as long as you call drain afterwards, you know that it's all sent. The docs don't explicitly say that it must be called repeatedly, and write doesn't return anything.

This is partially correct. Yes, asyncio will automatically keep writing the data you give it in the background until all is written, so you don't need to (nor can you) ensure it by checking the return value of write.

However, a sequence of stream.write(data); await stream.drain() will not pause the coroutine until all data has been transmitted to the OS. This is because drain doesn't wait for all data to be written, it only waits until it hits a "low watermark", trying to ensure (misguidedly according to some) that the buffer never becomes empty as long as there are new writes. As far as I know, in current asyncio there is no way to wait until all data has been sent - except for manually tweaking the watermarks, which is inconvenient and which the documentation warns against.

This is not as bad as it sounds simply because a successful write itself doesn't guarantee that the data was actually transmitted to, let alone received by the peer - it could be languishing in the socket buffer, or in network equipment along the way. But as long as you can rely on the system to send out the data you gave it as fast as possible, you don't really care whether some of it is in an asyncio buffer or in a kernel buffer. (But you still need to await drain() to ensure backpressure.)

The one time you do care is when you are about to exit the program or the event loop; in that case, a portion of the data being stuck in an asyncio buffer means that the peer will never see it. This is why, starting with 3.7, asyncio provides a wait_closed() method which you can await after calling close() to ensure that all the data has been sent. One could imagine a flush() method that does the same, but without having to actually close the socket (analogous to the method of the same name on file objects, and with equivalent semantics), but currently there are no plans to add it.

Upvotes: 4

Related Questions