Reputation: 3042
I'm looking for a command line wrapper for the DEFLATE algorithm.
I have a file (git blob) that is compressed using DEFLATE, and I want to uncompress it. The gzip command does not seem to have an option to directly use the DEFLATE algorithm, rather than the gzip format.
Ideally I'm looking for a standard Unix/Linux tool that can do this.
edit: This is the output I get when trying to use gzip for my problem:
$ cat .git/objects/c0/fb67ab3fda7909000da003f4b2ce50a53f43e7 | gunzip
gzip: stdin: not in gzip format
Upvotes: 95
Views: 85183
Reputation: 21
I have repeatedly come across this problem and it seems almost all of answers on the Internet are either wrong, require compiling some less than ideal code, or downloading a whole slew of dependencies untracked by the system! But I found a real solution. It uses PERL since PERL is readily available on most systems.
From a Bash-alike shell:
perl -mIO::Uncompress::RawInflate=rawinflate -erawinflate'"-","-"'
Or, if you're exec/fork-ing manually (without shell quotes, but line separated):
perl
-mIO::Uncompress::RawInflate=rawinflate
-erawinflate"-","-"
Big caveat: If the stream doesn't start off as a valid DEFLATE stream (such as say, uncompressed data), then this command will happily pipe all the data through untouched. Only if the stream begins as a valid DEFLATE stream (with a valid dictionary I suppose? I'm not too sure...), then this command will error somehow. In some situations this may be desirable however.
References:
PERL IO::Uncompress::RawInflate::rawinflate
Upvotes: 2
Reputation: 6147
pythonic one-liner (updated for python3's sharp distinction between text and binary data):
$> python -c "import zlib,sys;\
sys.stdout.buffer.write(zlib.decompress(sys.stdin.buffer.read()))" < $IN
Upvotes: 44
Reputation: 4717
To add to the collection, here are perl one-liners for deflate/inflate/raw deflate/raw inflate.
Deflate
perl -MIO::Compress::Deflate -e 'undef $/; my ($in, $out) = (<>, undef); IO::Compress::Deflate::deflate(\$in, \$out); print $out;'
Inflate
perl -MIO::Uncompress::Inflate -e 'undef $/; my ($in, $out) = (<>, undef); IO::Uncompress::Inflate::inflate(\$in, \$out); print $out;'
Raw deflate
perl -MIO::Compress::RawDeflate -e 'undef $/; my ($in, $out) = (<>, undef); IO::Compress::RawDeflate::rawdeflate(\$in, \$out); print $out;'
Raw inflate
perl -MIO::Uncompress::RawInflate -e 'undef $/; my ($in, $out) = (<>, undef); IO::Uncompress::RawInflate::rawinflate(\$in, \$out); print $out;'
Upvotes: 1
Reputation: 6210
const zlib = require("zlib");
const adler32 = require("adler32");
const data = "hello world~!";
const chksum = adler32.sum(new Buffer(data)).toString(16);
console.log("789c",zlib.deflateRawSync(data).toString("hex"),chksum);
// or
console.log(zlib.deflateSync(data).toString("hex"));
Upvotes: 0
Reputation: 949
I found this question looking for a work-around with a bug with the -text
utility in the new version of the hadoop dfs
client I just installed. The -text
utility works like cat
, except if the file being read is compressed, it transparently decompresses and outputs the plain-text (hence the name).
The answers already posted were definitely helpful, but some of them have one problem when dealing with Hadoop-sized amounts of data - they read everything into memory before decompressing.
So, here are my variations on the Perl
and Python
answers above that do not have that limitation:
Python:
hadoop fs -cat /path/to/example.deflate |
python -c 'import zlib,sys;map(lambda b:sys.stdout.write(zlib.decompress(b)),iter(lambda:sys.stdin.read(4096),""))'
Perl:
hadoop fs -cat /path/to/example.deflate |
perl -MCompress::Zlib -e 'print uncompress($buf) while sysread(STDIN,$buf,4096)'
Note the use of the -cat
sub-command, instead of -text
. This is so that my work-around does not break after they've fixed the bug. Apologies for the readability of the python version.
Upvotes: 1
Reputation: 97
Python3 oneliner:
python3 -c "import zlib,sys; sys.stdout.buffer.write(zlib.decompress(sys.stdin.buffer.read()))" < infile > outfile
This way the contents is handled as binary data, avoiding conversion to/from unicode.
Upvotes: 2
Reputation: 269
// save this as deflate.go
package main
import (
"compress/zlib"
"io"
"os"
"flag"
)
var infile = flag.String("f", "", "infile")
func main() {
flag.Parse()
file, _ := os.Open(*infile)
r, err := zlib.NewReader(file)
if err != nil {
panic(err)
}
io.Copy(os.Stdout, r)
r.Close()
}
$ go build deflate.go
$ ./deflate -f .git/objects/c0/fb67ab3fda7909000da003f4b2ce50a53f43e7
Upvotes: 7
Reputation: 133712
Something like the following will print the raw content, including the "$type $length\0" header:
perl -MCompress::Zlib -e 'undef $/; print uncompress(<>)' \
< .git/objects/27/de0a1dd5a89a94990618632967a1c86a82d577
Upvotes: 54
Reputation: 4061
UPDATE: Mark Adler noted that git blobs are not raw DEFLATE streams, but zlib streams. These can be unpacked by the pigz
tool, which comes pre-packaged in several Linux distributions:
$ cat foo.txt
file foo.txt!
$ git ls-files -s foo.txt
100644 7a79fc625cac65001fb127f468847ab93b5f8b19 0 foo.txt
$ pigz -d < .git/objects/7a/79fc625cac65001fb127f468847ab93b5f8b19
blob 14file foo.txt!
Edit by kriegaex: Git Bash for Windows users will notice that pigz is unavailable by default. You can find precompiled 32/64-bit versions here. I tried the 64-bit version and it works nicely. You can e.g. copy pigz.exe directly to c:\Program Files\Git\usr\bin
in order to put it on the path.
Edit by mjaggard: Homebrew and Macports both have pigz
available so you can install with brew install pigz
or sudo port install pigz
(if you do not have it already, you can install Homebrew by following the instructions on their website)
My original answer, kept for historical reasons:
If I understand the hint in the Wikipedia article mentioned by Marc van Kempen, you can use puff.c
from zlib directly.
This is a small example:
#include <assert.h>
#include <string.h>
#include "puff.h"
int main( int argc, char **argv ) {
unsigned char dest[ 5 ];
unsigned long destlen = 4;
const unsigned char *source = "\x4B\x2C\x4E\x49\x03\x00";
unsigned long sourcelen = 6;
assert( puff( dest, &destlen, source, &sourcelen ) == 0 );
dest[ 4 ] = '\0';
assert( strcmp( dest, "asdf" ) == 0 );
}
Upvotes: 42
Reputation: 6981
This is how I do it with Powershell.
$fs = New-Object IO.FileStream((Resolve-Path $Path), [IO.FileMode]::Open, [IO.FileAccess]::Read)
$fs.Position = 2
$cs = New-Object IO.Compression.DeflateStream($fs, [IO.Compression.CompressionMode]::Decompress)
$sr = New-Object IO.StreamReader($cs)
$sr.ReadToEnd()
You can then create an alias like:
function func_deflate{
param(
[Parameter(Mandatory=$true, ValueFromPipeline = $true)]
[ValidateScript({Test-Path $_ -PathType leaf})]
[string]$Path
)
$ErrorActionPreference = 'Stop'
$fs = New-Object IO.FileStream((Resolve-Path $Path), [IO.FileMode]::Open, [IO.FileAccess]::Read)
$fs.Position = 2
$cs = New-Object IO.Compression.DeflateStream($fs, [IO.Compression.CompressionMode]::Decompress)
$sr = New-Object IO.StreamReader($cs)
return $sr.ReadToEnd()
}
Set-Alias -Name deflate -Value func_deflate
Upvotes: 1
Reputation: 112502
git objects are zlib streams (not raw deflate). pigz will decompress those with the -dz
option.
Upvotes: 4
Reputation: 10926
You can do this with the OpenSSL command line tool:
openssl zlib -d < $IN > $OUT
Unfortunately, at least on Ubuntu, the zlib
subcommand is disabled in the default build configuration (--no-zlib
--no-zlib-dynamic
), so you would need to compile openssl
from source to use it. But it is enabled by default on Arch, for example.
Edit: Seems like the zlib
command is no longer supported on Arch either. This answer might not be useful anymore :(
Upvotes: 55
Reputation: 5557
pigz can do it:
apt-get install pigz
unpigz -c .git/objects/c0/fb67ab3fda7909000da003f4b2ce50a53f43e7
Upvotes: 4
Reputation: 166785
Try the following command:
printf "\x1f\x8b\x08\x00\x00\x00\x00\x00" | cat - .git/objects/c0/fb67ab3fda7909000da003f4b2ce50a53f43e7 | gunzip
No external tools are needed.
Source: How to uncompress zlib data in UNIX? at unix SE
Upvotes: 30
Reputation: 2804
You can use zlib-flate, like this:
cat .git/objects/c0/fb67ab3fda7909000da003f4b2ce50a53f43e7 \
| zlib-flate -uncompress; echo
It's there by default on my machine, but it's part of qpdf - tools for and transforming and inspecting PDF files
if you need to install it.
I've popped an echo
on the end of the command, as it's easier to read the output that way.
Upvotes: 30
Reputation: 189
I got tired of not having a good solution for this, so I put something on NPM:
https://github.com/jezell/zlibber
Now can just pipe to inflate / deflate command.
Upvotes: 12
Reputation: 439
Looks like Mark Adler has us in mind and wrote an example of just how to do this with: http://www.zlib.net/zpipe.c
It compiles with nothing more than gcc -lz
and the zlib headers installed. I copied the resulting binary to my /usr/local/bin/zpipe
while working with git stuff.
Upvotes: 9
Reputation: 1713
git objects are compressed by zlib
rather than gzip
, so either using zlib
to uncompress it, or git command, i.e. git cat-file -p <SHA1>
, to print content.
Upvotes: 9
Reputation: 8487
Here is a Ruby one-liner ( cd .git/ first and identify path to any object ):
ruby -rzlib -e 'print Zlib::Inflate.new.inflate(STDIN.read)' < ./74/c757240ec596063af8cd273ebd9f67073e1208
Upvotes: 14
Reputation: 101
Here's a example of breaking open a commit object in Python:
$ git show
commit 0972d7651ff85bedf464fba868c2ef434543916a
# all the junk in my commit...
$ python
>>> import zlib
>>> file = open(".git/objects/09/72d7651ff85bedf464fba868c2ef434543916a")
>>> data = file.read()
>>> print data
# binary garbage
>>> unzipped_data = zlib.decompress(data)
>>> print unzipped_data
# all the junk in my commit!
What you will see there is almost identical to the output of 'git cat-file -p [hash]', except that command doesn't print the header ('commit' followed by the size of the content and a null byte).
Upvotes: 10
Reputation: 28259
Why don't you just use git's tools to access the data? This should be able to read any git object:
git show --pretty=raw <object SHA-1>
Upvotes: 1
Reputation: 556
See http://en.wikipedia.org/wiki/DEFLATE#Encoder_implementations
It lists a number of software implementations, including gzip, so that should work. Did you try just running gzip on the file? Does it not recognize the format automatically?
How do you know it is compressed using DEFLATE? What tool was used to compress the file?
Upvotes: 1