A. Rex
A. Rex

Reputation: 31981

How do I transparently compress/decompress a file as a program writes to/reads from it?

I have a program that reads and writes very large text files. However, because of the format of these files (they are ASCII representations of what should have been binary data), these files are actually very easily compressed. For example, some of these files are over 10GB in size, but gzip achieves 95% compression.

I can't modify the program but disk space is precious, so I need to set up a way that it can read and write these files while they're being transparently compressed and decompressed.

The program can only read and write files, so as far as I understand, I need to set up a named pipe for both input and output. Some people are suggesting a compressed filesystem instead, which seems like it would work, too. How do I make either work?

Technical information: I'm on a modern Linux. The program reads a separate input and output file. It reads through the input file in order, though twice. It writes the output file in order.

Upvotes: 8

Views: 4257

Answers (5)

rogerdpack
rogerdpack

Reputation: 66751

btrfs:

https://btrfs.wiki.kernel.org/index.php/Main_Page

provides support for pretty fast "automatic transparent compression/decompression" these days, and is present (though marked experimental) in newer kernels.

Upvotes: 2

shodanex
shodanex

Reputation: 15406

named pipes won't give you full duplex operations, so it will be a little bit more complicated if you need to provide just one filename.

Do you know if your applications needs to seek through the file ?

Does your application work with stdin, stdout ?

Maybe a solution is to create a mini compressed file system that contains only a directory with your files

Since you have separate input and output file you can do the following :

mkfifo readfifo
mkfifo writefifo
zcat your inputfile > readfifo &
gzip writefifo > youroutputfile &

launch your program !

Now, you probably will get in trouble with the read twice in order of the input, because as soon as zcat is finished reading the input file, yout program will get a SIGPIPE signal

The proper solution is probably to use a compressed file system like CompFUSE, because then you don't have to worry about unsupported operations like seek.

Upvotes: 2

trshiv
trshiv

Reputation: 2505

Which language are you using?

If you are using Java, take a look at GZipInputStream and GZipOutputStream classes in the API doc.

If you are using C/C++, zlibc is probably the best way to go about it.

Upvotes: 0

EFraim
EFraim

Reputation: 13028

Check out zlibc: http://zlibc.linux.lu/.

Also, if FUSE is an option (i.e. the kernel is not too old), consider: compFUSEd http://www.biggerbytes.be/

Upvotes: 5

Related Questions