strager
strager

Reputation: 90022

TextReader.ReadLine returns incomplete lines

I am using a Socket to receive data via TCP, and TextReader.ReadLine to read lines from the connection. There is a problem where a full line has not been received -- TextReader.ReadLine returns an incomplete string. I want it to return null, indicating that a full line could not be read. How can I do this?

Basically, I have this data incoming:

"hello\nworld\nthis is a test\n"

When I run ReadLine I get these in return:

"hello"
"world"
"this is a te"
<null>
<socket gets more data>
"st"
<null>

I do not want "this is a te" returned. Rather, I want "this is a test" to wait until the entire line has been received.

Code:

var endPoint = ...;
var socket = new Socket(AddressFamily.InterNetwork, SocketType.Stream, ProtocolType.IP);
socket.Connect(endPoint);

var stream = new NetworkStream(socket, true);

var messageBuffer = new StringBuilder();

// Data received async callback (called several times).
int bytesRead = stream.EndRead(result);
string data = Encoding.UTF8.GetString(readBuffer.Take(bytesRead).ToArray());
messageBuffer.Append(data);

using(var reader = new StringReader(messageBuffer.ToString()))
{
    // This loop does not know that Message.Read reads lines.  For all it knows, it could read bytes or words or the whole stream.

    while((Message msg = Message.Read(reader)) != null)  // See below.
    {
        Console.WriteLine(msg.ToString());    // See example input/echo above.
    }

    messageBuffer = new StringBuilder(reader.ReadToEnd());
}

// Method of Message.
public static Message Read(TextReader reader)
{
    string line = reader.ReadLine();

    if(line == null)
        return null;

    return Message.FromRawString(line);
}

Thanks.

Upvotes: 0

Views: 2599

Answers (5)

strager
strager

Reputation: 90022

I decided to write my own ReadLine parser-ish kinda thing. Here's the code:

// Async callback.
Message message;

while((message = Message.ReadBytes(messageBuffer)) != null)
{
    OnMessageReceived(new MessageEventArgs(message));
}

// Message class.
public static Message ReadBytes(List<byte> data)
{
    int end = data.FindIndex(b => b == '\n' || b == '\r');

    if(end == -1)
        return null;

    string line = Encoding.UTF8.GetString(data.Take(end).ToArray());

    data.RemoveRange(0, end + 1);

    if(line == "")
        return ReadBytes(data);

    if(line == null)
        return null;

    return Message.FromRawString(line);
}

Many thanks to @Jon Skeet, @Noldorin, and @Richard for their very helpful suggestions. Your combined efforts led me to my final solution. =]

Upvotes: 0

Richard
Richard

Reputation: 109015

Several issues can be seen here:

  1. A single Unicode code point could be split across packets, so you need to keep you own instance of Utf8Encoding around. Alternatively buffer up the complete message as byte[] and convert in one go when you know it is complete.

  2. You need a way of determining when you have received a complete message. You need to keep reading until it is complete (and handle the case where you start receiving the next packet in the same Read call.

Upvotes: 0

Jon Skeet
Jon Skeet

Reputation: 1500785

It sounds like the data is being sent with some extra delimiters. Assuming you're using a StreamReader over a network stream, it should behave exactly as you expect. I suggest you use Wireshark to look at the exact data your socket is receiving.

I also doubt that it's returning null and then another line - are you sure you don't mean it returns an empty string and then another line?

EDIT: Now you've posted the code, the reason is a lot clearer - you're decoding just a single buffer at a time. That really won't work, and could break in much more serious ways. The buffer might not even break at a character boundary.

To be honest, it'll be a lot easier to read synchronously and use a StreamReader. Doing it asynchronously, you should use a System.Text.Decoder which can store any previous state (from the end of the previous buffer) if it needs to. You'll also have to store however much of the previous line was read - and I suspect you won't be able to use TextReader at all, or at least you'll have to have special handling for the case where the final character is '\r' or '\n'. Bear in mind that one buffer could end with '\r' and the next buffer start with '\n', representing a single line break between them. See how difficult it can get?

Do you definitely, definitely need to handle this asynchronously?

EDIT: It sounds like you could do with something which you can basically dump data into, and attach a "LineCompleted" event handler. You could make attach the event handler to start with and then just keep dumping data into it until there's no more data (at which point you'd need to tell it that the data has finished). If that sounds appropriate, I might try to work on such a class for MiscUtil - but I'd be unlikely to finish it within the next week (I'm really busy at the moment).

Upvotes: 3

Noldorin
Noldorin

Reputation: 147340

See my answer to a previous very similar question. It relates to asynchronous socket I/O and reading lines in a stream-like fashion. Hope that helps.

Upvotes: 0

MarkusQ
MarkusQ

Reputation: 21950

Have a buffer (starts empty), and each time you read

  • if there is an \n in the buffer, remove everything everything upto and including it and return it
  • read what you can, and append what you read to the buffer
  • if the read fails due to eof, return and clear the contents unless the buffer is empty, in which case propogate the eof.
  • if there is an \n in what you read, try again from the top, else return null

Note that this will do what you want but, with any such scheme, you now have to worry about what to do with lines that are too long for your buffer.

-- MarkusQ

Upvotes: 0

Related Questions