Reputation: 34
I'm using Hadoop Archive for reduce number of files in my Hadoop cluster, but for data retention, I want to keep my data as long as possible. Then the problem is Hadoop Archive not reduce folder size (my folder have multi-type of file, both small and large file, then not suitable for use Sequence File).
I used some option like -D mapreduce.compress.map.output=true -D mapred.map.ouput.compress.codec=org.apache.hadoop.io.compress.GzipCodec
but it's not work.
Does anyone know a way for compress output of Hadoop Archive, or suggest me someway to get both goal (compress size and reduce number of file).
Any infomation is appreciate. Thanks so much.
Upvotes: 2
Views: 544
Reputation: 177
You may use mapred compress and run har on the compressed directories
Upvotes: 0