Malcolm McCaffery
Malcolm McCaffery

Reputation: 2576

Content-Length Occasionally Wrong on Simple C# HTTP Server

For some experimentation was working with Simple HTTP Server code here

In one case I wanted it to serve some ANSI encoded text configuration files. I am aware there are more issues with this code but the only one I'm currently concerned with is Content-Length is wrong, but only for certain text files.

Example code:

Output stream initialisation:

outputStream = new StreamWriter(new BufferedStream(socket.GetStream()));

The handling of HTTP get:

public override void handleGETRequest(HttpProcessor p)
{

    if (p.http_url.EndsWith(".pac"))
    {
        string filename = Path.Combine(Path.GetDirectoryName(System.Reflection.Assembly.GetExecutingAssembly().Location), p.http_url.Substring(1));
        Console.WriteLine(string.Format("HTTP request for : {0}", filename));
        if (File.Exists(filename))
        {
            FileInfo fi = new FileInfo(filename);
            DateTime lastWrite = fi.LastWriteTime;

            Stream fs = File.Open(filename, FileMode.Open, FileAccess.Read, FileShare.Read);
            StreamReader sr = new StreamReader(fs);
            string result = sr.ReadToEnd().Trim();
            Console.WriteLine(fi.Length);
            Console.WriteLine(result.Length);
            p.writeSuccess("application/x-javascript-config",result.Length,lastWrite);
            p.outputStream.Write(result);
            // fs.CopyTo(p.outputStream.BaseStream);
            p.outputStream.BaseStream.Flush();
            fs.Close();
        }
        else
        {
            Console.WriteLine("404 - FILE not found!");
            p.writeFailure();
        }
    }

}  

   public void writeSuccess(string content_type,long length,DateTime lastModified) {
            outputStream.Write("HTTP/1.0 200 OK\r\n");            
            outputStream.Write("Content-Type: " + content_type + "\r\n");
            outputStream.Write("Last-Modified: {0}\r\n", lastModified.ToUniversalTime().ToString("r"));
            outputStream.Write("Accept-Range: bytes\r\n");
            outputStream.Write("Server: FlakyHTTPServer/1.3\r\n");
            outputStream.Write("Date: {0}\r\n", DateTime.Now.ToUniversalTime().ToString("r"));
            outputStream.Write(string.Format("Content-Length: {0}\r\n\r\n", length));   
              }

For most files I've tested with Content-Length is correct. However when testing with HTTP debugging tool Fiddler some times protocol violation is reported on Content-Length.

For example fiddler says:

Request Count: 1 Bytes Sent: 303 (headers:303; body:0) Bytes Received: 29,847 (headers:224; body:29,623)

So Content-Length should be 29623. But the HTTP header generated is

Content-Length: 29617

I saved the body of HTTP content from Fiddler and visibly compared the files, couldn't notice any difference. Then loaded them into BeyondCompare Hex compare, there are several problems with files like this:

Original File: 2D 2D 96       20 2A 2F
HTTP Content : 2D 2D EF BF BD 20 2A 2F

Original File: 27 3B 0D 0A 09 7D 0D 0A 0D 0A 09
HTTP Content : 27 3B    0A 09 7D    0A    0A 09

I suspect problem is related to encoding but not exactly sure. Only serving ANSI encoded files, no Unicode.

I made the file serve correctly with right Content-Length by modifying parts of the file with bytes sequence. Made this change in 3 parts of the file:

2D 2D 96 (--–) to 2D 2D 2D (---)

Upvotes: 1

Views: 2423

Answers (1)

Jason Hoetger
Jason Hoetger

Reputation: 8147

Based on the bytes you pasted, it looks like there are a couple things going wrong here. First, it seems that CRLF in your input file (0D 0A) is being converted to just LF (0A). Second, it looks like the character encoding is changing, either when reading the file into a string, or Writeing the string to the HTTP client.

The HTTP Content-Length represents the number of bytes in the stream, whereas string.Length gives you the number of characters in the string. Unless your file is exclusively using the first 128 ASCII characters (which precludes non-English characters as well as special windows-1252 characters like the euro sign), it's unlikely that string.Length will exactly equal the length of the string encoded in either UTF-8 or ISO-8859-1.

If you convert the string to a byte[] before sending it to the client, you'll be able to get the "true" Content-Length. However, you'll still end up with mangled text if you didn't read the file using the proper encoding. (Whether you specify the encoding or not, a conversion is happening when reading the file into a string of Unicode characters.)

I highly recommend specifying the charset in the Content-Type header (e.g. application/x-javascript-config;charset=utf-8). It doesn't matter whether your charset is utf-8, utf-16, iso-8859-1, windows-1251, etc., as long as it's the same character encoding you use when converting your string into a byte[].

Upvotes: 4

Related Questions