NickSoft
NickSoft

Reputation: 3335

php and Content-Length header connection stall

I have a php website. Since I'm using template engine and I always do the html in "one-shot" I have the size of the html document upfront. So I decided to set Content-Length header for better performance. If I don't set it the document is transferred using chunked encoding.

The php code for html output looks like this:

header('Accept-Ranges: none');
header('Content-Length: '.strlen($content));

echo $content;

I tested it under windows in Chrome, IE, Firefox and Safari - it works file. However Microsoft Bing bot (using bing webmaster tools) said that the website does not respond. I decided to investigate and here is what I found out:

so elinks on Centos 5 was the only http client that I found which has problems accessing the site. However I don't know how to get debug information out of it.

Questions:

  1. Can someone tell me how to get debug info out of elinks. Is it possible to have raw dup of http+headers? Or some kind of error log
  2. Any idea why stalling happens in one client and doesn't heppen in another?
  3. Well it's most probably the incorrect header "Content-Length" that's causing the problem because when I remove it it works fine in elinks and Bing. What could cause content lenght difference
  4. Any other http clients to test with?

All tests are done on the same web server, the same php version, the same web page and with the same content. What I can think of is UTF-8 text file identifier (the few bytes in front of a text file that some browsers place)

Here is a dump of headers with wget:

wget dev.site.com/ --server-response -O /dev/null
--2013-11-09 01:32:37--  http://dev.site.com/
Resolving dev.site.com... 127.0.0.1
Connecting to dev.site.com|127.0.0.1|:80... connected.
HTTP request sent, awaiting response...
  HTTP/1.1 200 OK
  Date: Fri, 08 Nov 2013 23:32:37 GMT
  Server: Apache
  Set-Cookie: lng=en; expires=Wed, 07-May-2014 23:32:37 GMT; path=/; domain=dev.site.com
  Last-Modified: Fri, 08 Nov 2013 23:32:37 GMT
  Cache-Control: must-revalidate, post-check=0, pre-check=0
  Pragma: no-cache
  Expires: 0
  Set-Cookie: PHPSESSID=8a1e9b871474b882e1eef4ca0dfea0fc; expires=Thu, 06-Feb-2014 23:32:37 GMT; path=/
  Content-Language: en
  Set-Cookie: hc=1518952; expires=Mon, 17-Nov-2036 00:38:00 GMT; path=/; domain=dev.site.com
  Accept-Ranges: none
  Content-Length: 16970
  Keep-Alive: timeout=15, max=100
  Connection: Keep-Alive
  Content-Type: text/html; charset=UTF-8
Length: 16970 (17K) [text/html]
Saving to: “/dev/null”

100%[===================================================================================================================================================================================================>] 16,970      --.-K/s   in 0.1s

2013-11-09 01:32:37 (152 KB/s) - “/dev/null” saved [16970/16970]

update:

I was able to reproduce the problem, but only on production server. One difference I notice between the working and non-working elinks is that non-working sends this header: Accept-Encoding: gzip

Of course if it's gzipped the size will be different. zlib.output_compression is On on php.ini. I guess that could be the problem. Also output buffering is 4096. That's strange because most browsers use compression when available. I'll try again in a web browser.

Yes browser (chrome) also asks for compression and gzip exists in response headers:

Content-Length: 15916
Content-Encoding: gzip

view source shows exactly 15916 bytes. Chrome has an option to show raw headers as well as parsed. What could be happening is that Chrome actually decompresses data before counting. Sounds strange but it's the only explanation why GUI web browsers work and some lower level clients don't

Upvotes: 1

Views: 4724

Answers (3)

Matthew Clark
Matthew Clark

Reputation: 1965

I had the same problem -- I was trying set the Content-Length header without being realizing that the length I measured inside the buffer would be larger than the actual GZip'd output (and yes, it seemed like the browser was hung). I stumbled upon this Q&A after I'd already solved my problem (solution below).

@Etherealone is spot-on with one point:

Connection does not stall. Your browser is waiting for more data to come but compressed data size is smaller than what browser is waiting for.

@Etherealone and @NickSoft both kinda hinted at this, but didn't actually say it: a Content-Length header for dynamically-generated content isn't necessary, and the server should instead send a Transfer-Encoding: chunked header. This tells the browser to keep the connection open until it receives a zero-length chunk, which signifies the end of the content.

However, chunking the transfer does add a bit of overhead, so wanting to specify a Content-Length certainly doesn't hurt. @NickSoft had the right idea, but it doesn't have to be quite so complicated.

So, if you insist on having a Content-Length header instead of letting the server chunk the content, then all you have to do is just buffer twice; once for compression, then again so you can measure the size and send the Content-Length header:

<?php

// "Outer" buffer to capture content and size of "inner" buffer and send content length header
ob_start();

// "Inner" buffer for compression
ob_start('ob_gzhandler');

// Do stuff...
echo $content;

// Flush the inner buffer, the contents of which is GZip'd
ob_end_flush();

// Measure the inner buffer size and set the header
header('Content-Length: ' . ob_get_length());  

// Send the outer buffer
ob_end_flush();

?>

After I implemented this, I saw the new Content-Length header; the Transfer-Encoding: chunked header disappeared; and the "hung" browser symptom went away (the browser got all of the content and closed the connection).

Upvotes: 0

NickSoft
NickSoft

Reputation: 3335

There is no nice-and-clean solution. I would love to be able to set zlib buffer size with:

zlib.output_compression = 131072

if I'm sure the page won't be more than 128k (uncompressed), however there is no way to get compressed size of the buffer.

So there are two solutions:

  1. turn off output compression or do not set Content-Length ... which is not much of a solution, but it works
  2. replace zlib compression handler with:

ob_start(); // start normal buffer
ob_start("ob_gzhandler"); // start gzip buffer
echo $content;
ob_end_flush(); // output gzipped content

$gzippedContent = ob_get_contents(); // store gzipped content to get size
header('Content-Length: '.strlen($gzippedContent));
ob_end_flush(); // flush gzipped content

But make sure that zlib.output_compression is Off.

Even though php manual sais that zlib.output_compression is prefered I doubt that using ob_gzhandler will dramatically reduce the performance.

You can set compression level by

ini_set('zlib.output_compression_level', 4);

I tested it and it works with both gzip enabled in client/browser and with gzip disabled.

wget --header='Accept-Encoding: gzip,deflate' -O ./page.html.gz http://www.site.com/ && gunzip page.html.gz
wget -O ./page.html http://www.site.com/

Upvotes: 1

Etherealone
Etherealone

Reputation: 3558

The answer is already there. Content-Length has to be the size that is actually being sent, which is the size after the '$content' is compressed. The size of the content you see on view-source is naturally decompressed size.

Connection does not stall. Your browser is waiting for more data to come but compressed data size is smaller than what browser is waiting for. If your server eventually timeouts the connection your browser will assume it got all the data and show it. It works with wget and such because they don't send accept-compression headers and server does not send compressed response.

If you must, you could disable compressing, manually compress and send $content and also appropriate Content-Encoding headers.

Another option is to download the page uncompressed (send Accept-Encoding: gzip with wget, I guess it won't get decompressed, but even though it is not enabled by default wget might support compression after all, I don't know. I know cURL doesn't support it you can use it) and get the size of the response minus headers (which means only size of the data after \r\n\r\n header end sequence) and use that size while sending Content-Length. But of course changing compression level or maybe implementation (different web servers/modules or different versions of the same web server/modules) will change the size of the resulting compressed data so this is a very fragile way to do this.

Why are you modifying Content-Length anyway? Php or web server is supposed to handle that.

Upvotes: 1

Related Questions