Reputation: 25287
I'm trying to learn more about how web and tcp work by implementing web tcp client.
Currently, my web request function looks like this:
public string SendWebRequest(SocketWebRequest request)
{
using (NetworkStream ns = tc.GetStream())
{
using (System.IO.StreamReader sr = new System.IO.StreamReader(ns))
{
request.WriteTo(ns);
ns.Flush();
var statusLine = sr.ReadLine();
ProcessStatusLine(statusLine);
Headers = ReadHeaders(sr);
ProcessCookies(request.Host);
int contentLength = 0;
if (Headers.ContainsKey("Content-Length"))
{
foreach (var cl in Headers["Content-Length"])
{
int buf;
if (int.TryParse(cl,out buf))
{
contentLength = buf;
break;
}
}
}
if (contentLength==0)
{
return "";
}
byte[] content = new byte[contentLength];
if (IsGziped())
{
MemoryStream decompressed = new MemoryStream();
using (var zs = new GZipStream(ns, CompressionMode.Decompress))
{
while (true)
{
var buf = new byte[1024];
int read = zs.Read(buf, 0, buf.Length);
if (read == 0)
{
break;
}
decompressed.Write(buf, 0, read);
}
}
content = decompressed.ToArray();
}
else
{
using (BinaryReader rdr = new BinaryReader(ns))
{
rdr.Read(content, 0, content.Length);
}
}
var encoding = GetEncoding();
return encoding.GetString(content.ToArray());
}
}
}
the request looks like this:
GET http://www.youtube.com/ HTTP/1.1
Accept: application/x-ms-application, image/jpeg, application/xaml+xml, image/gif, image/pjpeg, application/x-ms-xbap, */*
Accept-Language: en-US
User-Agent: Mozilla/4.0 (compatible; MSIE 8.0; Windows NT 6.1; WOW64; Trident/4.0; SLCC2; .NET CLR 2.0.50727; .NET CLR 3.5.30729; .NET CLR 3.0.30729; Media Center PC 6.0; .NET4.0C; .NET4.0E)
Accept-Encoding: gzip, deflate
Connection: Keep-Alive
Host:www.youtube.com
and the response headers look like this:
HTTP/1.1 200 OK
Date: Sat, 25 Aug 2012 19:46:51 GMT
Server: Apache
X-Content-Type-Options: nosniff
Content-Encoding: gzip
Set-Cookie: use_hitbox=d5c5516c3379125f43aa0d495d100d6ddAEAAAAw; path=/; domain=.youtube.com
Set-Cookie: VISITOR_INFO1_LIVE=av7rkkf4Sfw; path=/; domain=.youtube.com; expires=Mon, 22-Apr-2013 19:46:51 GMT
Expires: Tue, 27 Apr 1971 19:44:06 EST
Cache-Control: no-cache
P3P: CP="This is not a P3P policy! See //support.google.com/accounts/bin/answer.py?answer=151657&hl=en-US for more info."
X-Frame-Options: SAMEORIGIN
Content-Length: 18977
Content-Type: text/html; charset=utf-8
And after this the first int read = zs.Read(buf, 0, buf.Length);
sometimes works, but often fails with following exception:
The magic number in GZip header is not correct. Make sure you are passing in a GZip stream. I've tried reading the data as string, and it looks encoded.
Youtube works fine via browser. When reading the data as a string, it looks encoded.
Why am I getting this, and how should I fix that?
UPDATE
It looks like this is some sort of error during transmission. In 5 cases out of 10, it works, in other 5 it fails without an apparent reason
Here's the code if IsGziped()
bool IsGziped()
{
foreach (var h in Headers["Content-Encoding"])
{
if (h.ToLowerInvariant().Contains("gzip"))
{
return true;
}
}
return false;
}
Upvotes: 0
Views: 1043
Reputation: 21
You can separate seamlessly with the following code (the response stream with header).
// Read response.
var buffer2 = new byte[4096];
var hd = new MemoryStream();
var response = new MemoryStream();
var endHeader = false;
do
{
// Your networkstream object instead > "stream".
bytes = stream.Read(buffer2, 0, buffer2.Length);
if (!endHeader)
{
var startIndex = 0;
if (IsContainsHeaderCrLf(buffer2, out startIndex))
{
endHeader = true;
hd.Write(buffer2, 0,startIndex);
response.Write(buffer2, startIndex + 4, bytes - startIndex - 4);
}
else
{
hd.Write(buffer2, 0, bytes);
}
}
else
{
response.Write(buffer2, 0, bytes);
}
} while (bytes != 0);
var headertxt = System.Text.Encoding.UTF8.GetString(hd.ToArray());
var unziptxt = "";
var responsetxt = "";
if (headertxt.Contains("gzip"))
{
unziptxt = System.Text.Encoding.UTF8.GetString(Decompress(response.ToArray()));
}
else
{
responsetxt = System.Text.Encoding.UTF8.GetString(response.ToArray());
}
return headertxt + "\r\n\r\n" + unziptxt + responsetxt;
//...
private bool IsContainsHeaderCrLf(byte[] buffer, out int startIndex)
{
for (var i = 0; i <= buffer.Length - 4; i++)
{
if (buffer[i] == 13 & buffer[i + 1] == 10 && buffer[i + 2] == 13 && buffer[i + 3] == 10)
{
startIndex = i;
return true;
}
}
startIndex = -1;
return false;
}
Bonus decompress code.
static byte[] Decompress(byte[] gzip)
{
// Create a GZIP stream with decompression mode.
// ... Then create a buffer and write into while reading from the GZIP stream.
using (var stream = new GZipStream(new MemoryStream(gzip), CompressionMode.Decompress))
{
const int size = 4096;
var buffer = new byte[size];
using (var memory = new MemoryStream())
{
var count = 0;
do
{
count = stream.Read(buffer, 0, size);
if (count > 0)
{
memory.Write(buffer, 0, count);
}
}
while (count > 0);
return memory.ToArray();
}
}
}
Upvotes: 0
Reputation: 171178
StreamReader
does not necessarily read just the required number of bytes. It can read more due to internal buffering. This causes compressed bytes to be taken from the NetworkStream ns
and put into the StreamReader
internal buffer.
After the bytes have been taken the GZipStream
cannot read them.
You probably need to use a custom header parsing solution that works on a binary level. There is no way to restrict StreamReader
to just read the least possible amount of bytes.
StreamReader
is not made to be used together with other readers.
Upvotes: 1