Reputation: 11629
I am trying to experiment this parameter in MapReduce and I have some question.
Does this go by the size in HDFS (whether it is compressed or not)? Or is it after uncompression? I guess it is the former but just want to confirm.
Upvotes: 1
Views: 1200
Reputation: 21
From Hadoop 0.21 I think the bz2 files are splittable. So you can use bz2.
Upvotes: 2
Reputation: 30089
This parameter will only be used if your input format supports splitting the input files. Common compression codecs (such as gzip) don't support splitting the files, so this will be ignored.
If the input format does support splitting, then this relates to the compressed size.
Upvotes: 2