Reputation: 537
I have tried a few things to enable gzip compression using PHP Simple HTML DOM Parser but nothing has seemed to work thus far. Using ini_set I've manged to change the user agent, so I figured it might be possible to also enable gzip compression?
include("simpdom/simple_html_dom.php");
ini_set('zlib.output_compression', 'On');
$url = 'http://www.whatsmyip.org/http_compression/';
$html = file_get_html($url);
print $html;
The website above tests it. Please let me know if I am going about this the wrong way completely.
====
For anyone else trying to achieve the same thing, it's best to just use cURL, then use the dom parser like so:
$ch = curl_init();
curl_setopt($ch, CURLOPT_URL, $url); // Define target site
curl_setopt($ch, CURLOPT_RETURNTRANSFER, TRUE); // Return page in string
curl_setopt($cr, CURLOPT_USERAGENT, 'Mozilla/5.0 (Windows; U; Windows NT 6.1; en-US) AppleWebKit/533.2 (KHTML, like Gecko) Chrome/5.0.342.3 Safari/533.2');
curl_setopt($ch, CURLOPT_ENCODING , "gzip");
curl_setopt($ch, CURLOPT_TIMEOUT,5);
curl_setopt($ch, CURLOPT_FOLLOWLOCATION, TRUE); // Follow redirects
$return = curl_exec($ch);
$info = curl_getinfo($ch);
curl_close($ch);
$html = str_get_html("$return");
Upvotes: 4
Views: 4654
Reputation: 6058
Just add the following line at the very top of the PHP script that outputs the data:
ob_start("ob_gzhandler");
-------Update--------
You can also try to enable gzip Compresion sitewide via a .htaccess file. Something like This should gzip your sites content but images:
# Insert filter
SetOutputFilter DEFLATE
# Netscape 4.x has some problems...
BrowserMatch ^Mozilla/4 gzip-only-text/html
# Netscape 4.06-4.08 have some more problems
BrowserMatch ^Mozilla/4\.0[678] no-gzip
# MSIE masquerades as Netscape, but it is fine
# BrowserMatch \bMSIE !no-gzip !gzip-only-text/html
# NOTE: Due to a bug in mod_setenvif up to Apache 2.0.48
# the above regex won't work. You can use the following
# workaround to get the desired effect:
BrowserMatch \bMSI[E] !no-gzip !gzip-only-text/html
# Don't compress images
#SetEnvIfNoCase Request_URI \
\.(?:gif|jpe?g|png)$ no-gzip dont-vary
# Make sure proxies don't deliver the wrong content
Header append Vary User-Agent env=!dont-vary
Upvotes: 0
Reputation: 11068
CURLOPT_ENCODING is so that the response comes back (accepted as) gzipped data - the server settings (ob_start("ob_gzhandler") or php_ini..) tell the server to OUTPUT gzipped data.
Just like if you went to that page with a browser that didn't support gzip. To accept gzip data, you have to use curl so you can make that distinction.
Upvotes: 1