Reputation: 13334
How can I determine whether a string was compressed with gzcompress
(aparts from comparing sizes of string before/after calling gzuncompress
, or would that be the proper way of doing it) ?
Upvotes: 11
Views: 13400
Reputation: 112557
You can simply try gzuncompress()
on the data as noted by @DiDiegodaFonseca. If it fails, it was not made by gzcompress()
, or it was not faithfully transmitted.
If you really want to, you can check the first two bytes for a zlib header (not a gzip header, as incorrectly suggested in the accepted answer). gzcompress()
produces a zlib stream, not a gzip stream. gzencode()
is what produces a gzip stream. gzdeflate()
produces a raw deflate stream.
RFC 1950 describes the zlib header. It is two bytes, where the two bytes taken as a big-endian 16-bit unsigned integer must be a multiple of 31. In addition to checking that, you can check that the low four bits of the first byte is 8 (1000), and that the high bit is zero.
Upvotes: 0
Reputation:
PRE:
I guess, if you send a request, you can immediately look into $http_response_header
to see if the one of the items in the array is a variation of Content-Encoding: gzip
. But this is not ideal!
there is a far better method.
Here is HOW TO...
Check if its GZIP. Like a BOSS!
according to GZIP RFC:
+---+---+---+---+---+---+---+---+---+---+
|ID1|ID2|CM |FLG| MTIME |XFL|OS | (more-->)
+---+---+---+---+---+---+---+---+---+---+
the ID1
and ID2
identify the content as GZIP. And CM
states that the ZLIB_ENCODING
(the compression method) is ZLIB_ENCODING_DEFLATE
- which is customarily used by GZIP with all web-servers.
oh! and they have fixed values:
"\x1f"
"\x8b"
"\x08"
(or just 8...)<?php
/** @link https://gist.github.com/eladkarako/d8f3addf4e3be92bae96#file-checking_gzip_like_a_boss-php */
date_default_timezone_set("Asia/Jerusalem");
while (ob_get_level() > 0) ob_end_flush();
mb_language("uni");
@mb_internal_encoding('UTF-8');
setlocale(LC_ALL, 'en_US.UTF-8');
header('Time-Zone: Asia/Jerusalem');
header('Charset: UTF-8');
header('Content-Encoding: UTF-8');
header('Content-Type: text/plain; charset=UTF-8');
header('Access-Control-Allow-Origin: *');
function get($url, $cookie = '') {
$html = @file_get_contents($url, false, stream_context_create([
'http' => [
'method' => "GET",
'header' => implode("\r\n", [''
, 'Pragma: no-cache'
, 'Cache-Control: no-cache'
, 'User-Agent: Mozilla/5.0 (Windows NT 6.1; Win64; x64) AppleWebKit/537.36 (KHTML, like Gecko) Chrome/42.0.2310.0 Safari/537.36'
, 'DNT: 1'
, 'Accept-Language: en-US,en;q=0.8'
, 'Accept: text/plain'
, 'X-Forwarded-For: ' . implode(', ', array_unique(array_filter(array_map(function ($item) { return filter_input(INPUT_SERVER, $item, FILTER_SANITIZE_SPECIAL_CHARS); }, ['HTTP_X_FORWARDED_FOR', 'REMOTE_ADDR', 'HTTP_CLIENT_IP', 'SERVER_ADDR', 'REMOTE_ADDR']), function ($item) { return null !== $item; })))
, 'Referer: http://eladkarako.com'
, 'Connection: close'
, 'Cookie: ' . $cookie
, 'Accept-Encoding: gzip'
])
]]));
$is_gzip = 0 === mb_strpos($html, "\x1f" . "\x8b" . "\x08", 0, "US-ASCII");
return $is_gzip ? zlib_decode($html, ZLIB_ENCODING_DEFLATE) : $html;
}
$html = get('http://www.pogdesign.co.uk/cat/');
echo $html;
UTF-8
(since we don't really know if the web-server will return a GZIP content.Accept-Encoding: gzip
, tells the web-sever, it may output a GZIP content.ZLIB
methods.Upvotes: 29
Reputation: 21
This work fine for me:
if (@gzuncompress($_xml)!==false) {
// gzipped sring
Upvotes: 2
Reputation: 522597
A string and a compressed string are both simply sequences of bytes. You cannot really distinguish one sequence of bytes from another sequence of bytes. You should know whether a blob of bytes represents a compressed format or not from accompanying metadata.
If you really need to guess programmatically, you have several things you can try:
0x20
. Those bytes aren't typically used in regular text. There's no real guarantee that they occur in a compressed string though.mb_check_encoding
to see whether a string is valid in the encoding you suspect it to be in. If it isn't, it's probably compressed (or you checked for the wrong encoding). With the caveat that virtually any byte sequence is valid in virtually every single-byte encoding, so this'll only work for multi-byte encodings.Upvotes: 9