Reputation: 1241
I have to use Snappy to compress the map o/p and the map-reduce o/p as well. Further, this should be splittable.
As I studied online, to make Snappy write splittable o/p, we have to use it in a Container like format.
Can you please suggest how to go about it? I tried finding some examples online, but could not fine one. I am using Hadoop v0.20.203.
Thanks. Piyush
Upvotes: 4
Views: 5379
Reputation: 3157
In the new API OutputFormat installing for the Job, and not for the configuration. Then, first part will be:
Job job = new Job(conf);
...
SequenceFileOutputFormat.setOutputCompressionType(job, CompressionType.BLOCK);
SequenceFileOutputFormat.setCompressOutput(job, true);
conf.set("mapred.output.compression.codec","org.apache.hadoop.io.compress.SnappyCodec");
Upvotes: 1
Reputation: 470
for output
conf.setOutputFormat(SequenceFileOutputFormat.class);
SequenceFileOutputFormat.setOutputCompressionType(conf, CompressionType.BLOCK);
SequenceFileOutputFormat.setCompressOutput(conf, true);
conf.set("mapred.output.compression.codec","org.apache.hadoop.io.compress.SnappyCodec");
For map output
Configuration conf = new Configuration();
conf.setBoolean("mapred.compress.map.output", true);
conf.set("mapred.map.output.compression.codec","org.apache.hadoop.io.compress.SnappyCodec");
Upvotes: 5