Reputation: 1099
I can compress mapreduce output to gzip with
"mapred.output.compression.codec=org.apache.hadoop.io.compress.GzipCodec"
Will it be straightforward to implement zip codec for hadoop? Zip is container, but I need only one file per archive, so would it be easy to create ZipCodec
with CompressionCodec
interface?
Or, maybe there is an efficient way to convert gz
files to zip
s, since they can use same deflate algorithm?
Upvotes: 3
Views: 1595
Reputation: 20969
No big deal, you can wrap a java.util.zip.ZipOutputStream
.
You can do this by implementing your own codec, which is done by extending org.apache.hadoop.io.compress.DefaultCodec
.
In this codec you wrap the java zip streams by extending org.apache.hadoop.io.compress.CompressorStream
respectively org.apache.hadoop.io.compress.DecompressorStream
.
In the end you have to override the createInputStream
and createOutputStream
method and return a new instance of the wrapped streams there.
Still a bit of coding, I'm pretty sure there must be an already existing implementation somewhere (I may recall it also was in a Hadoop release years ago).
Upvotes: 3