Piyush Kansal
Piyush Kansal

Reputation: 1241

How to use Snappy in Hadoop in Container format

I have to use Snappy to compress the map o/p and the map-reduce o/p as well. Further, this should be splittable.

As I studied online, to make Snappy write splittable o/p, we have to use it in a Container like format.

Can you please suggest how to go about it? I tried finding some examples online, but could not fine one. I am using Hadoop v0.20.203.

Thanks. Piyush

Upvotes: 4

Views: 5379

Answers (2)

VeLKerr
VeLKerr

Reputation: 3157

In the new API OutputFormat installing for the Job, and not for the configuration. Then, first part will be:

Job job = new Job(conf);
...
SequenceFileOutputFormat.setOutputCompressionType(job, CompressionType.BLOCK);
SequenceFileOutputFormat.setCompressOutput(job, true);

conf.set("mapred.output.compression.codec","org.apache.hadoop.io.compress.SnappyCodec");

Upvotes: 1

root1982
root1982

Reputation: 470

for output

conf.setOutputFormat(SequenceFileOutputFormat.class); SequenceFileOutputFormat.setOutputCompressionType(conf, CompressionType.BLOCK); SequenceFileOutputFormat.setCompressOutput(conf, true); conf.set("mapred.output.compression.codec","org.apache.hadoop.io.compress.SnappyCodec");

For map output

Configuration conf = new Configuration(); conf.setBoolean("mapred.compress.map.output", true); conf.set("mapred.map.output.compression.codec","org.apache.hadoop.io.compress.SnappyCodec");

Upvotes: 5

Related Questions