Reputation: 619
In Hadoop program, I tried to compress the result, I wrote the following code:
FileOutputFormat.setCompressOutput(job, true);
FileOutputFormat.setOutputCompressorClass(job, GzipCodec.class);
The result was compressed, and when I delete the first line:
FileOutputFormat.setCompressOutput(job, true);
and execute the program again, the result was same, was the above code
FileOutputFormat.setCompressOutput(job, true);
optional? What is the function of that code?
Upvotes: 1
Views: 140
Reputation: 29195
Please see the below methods in FileOutPutFormat.java which internally calls the method call which you have deleted.
i.e setCompressOutput(conf, true);
That means you are trying apply Gzip codec class then obviously its a pointer to code that output should be compressed. Isnt it ?
/**
* Set whether the output of the job is compressed.
* @param conf the {@link JobConf} to modify
* @param compress should the output of the job be compressed?
*/
public static void setCompressOutput(JobConf conf, boolean compress) {
conf.setBoolean("mapred.output.compress", compress);
}
/**
* Set the {@link CompressionCodec} to be used to compress job outputs.
* @param conf the {@link JobConf} to modify
* @param codecClass the {@link CompressionCodec} to be used to
* compress the job outputs
*/
public static void
setOutputCompressorClass(JobConf conf,
Class<? extends CompressionCodec> codecClass) {
setCompressOutput(conf, true);
conf.setClass("mapred.output.compression.codec", codecClass,
CompressionCodec.class);
}
Upvotes: 1