joelc
joelc

Reputation: 2761

Best way to read a stream to a delimiter and no farther

My code has to consume data from a NetworkStream, and the data read from the stream will contain three parts: metadata, a well-known delimiter, and data.

I'm trying to determine the most efficient way of reading from the NetworkStream, up to the end of the delimiter. The metadata portion is generally measured in hundreds of bytes (but could be as small as 32 bytes), the delimiter is a specific 2-byte sequence, and the data could range from zero bytes to several gigabytes in size (the metadata provides information on the data length). I should only read up to the delimiter, because the rest of the stream (containing payload data) needs to be used elsewhere, and NetworkStream doesn't support seek and the data may be so large that I can't dump it all into a MemoryStream.

I've been using the following, and it works, but it seems there could be a more efficient way of reading up to the delimiter. Since the minimum metadata size is 32 bytes, I start with a 34-byte buffer (32 bytes of metadata + 2 bytes delimiter), read from the stream, and check for the delimiter. If the delimiter is found (smallest possible metadata), the code then breaks and the balance of the stream contains the data. If the delimiter is not found, the code then loops reading a single byte at a time, checking the last two bytes of the StringBuilder used to hold what has been read from the stream, until the delimiter is found at the end.

(code reduced for brevity, removed checking of negative cases, etc)

string delim = "__";
StringBuilder sb = new StringBuilder();

byte[] buffer = new byte[1];
byte[] initialBuffer = new byte[34];
int bytesRead = stream.Read(initialBuffer, 0, 34);  // yes I check bytesRead in the actual code
sb.Append(Encoding.UTF8.GetString(initialBuffer);

while (true)
{
    string delimCheck = sb.ToString((sb.Length - 2), 2);
    if (delimCheck.Equals(delim)) break;
    else
    {
        buffer = new byte[1];
        bytesRead = stream.Read(buffer, 0, 1); // yes I check bytesRead in the actual code
        sb.Append(Encoding.UTF8.GetString(buffer));
    }
}

The code works, but it seems really inefficient and slow to read one byte at a time to reach the end of the delimiter. Is anything readily apparent that might better optimize this code?

Thanks!

Upvotes: 0

Views: 1186

Answers (1)

Ben Voigt
Ben Voigt

Reputation: 283803

Do you see those Read(array, offset, count) return values you are putting into a variable bytesRead and then happily ignoring?

Those (along with setting the socket in non-blocking mode) are the solution to your problem. Then you can access "everything received so far" without getting stuck waiting for enough extra data to arrive to fill your array.

Even in blocking mode, ignoring that return value is a bug, because when the socket is gracefully shut down, you will get a partial read where bytesRead < bytesRequested


Regarding your concerns about how to save the extra data for later, Microsoft provided a class for that. See System.IO.BufferedStream and the example:

The following code examples show how to use the BufferedStream class over the NetworkStream class to increase the performance of certain I/O operations. Start the server on a remote computer before starting the client. Specify the remote computer name as a command-line argument when starting the client. Vary the dataArraySize and streamBufferSize constants to view their effect on performance.

Not shown in the example is that you still need to put the socket into non-blocking mode to avoid having the BufferedStream block until an entire buffer chunk is received. The Socket class provides the Blocking property to make that easy.

Upvotes: 2

Related Questions