dltu
dltu

Reputation: 34

Compress output of Hadoop Archive tool

I'm using Hadoop Archive for reduce number of files in my Hadoop cluster, but for data retention, I want to keep my data as long as possible. Then the problem is Hadoop Archive not reduce folder size (my folder have multi-type of file, both small and large file, then not suitable for use Sequence File).

I used some option like -D mapreduce.compress.map.output=true -D mapred.map.ouput.compress.codec=org.apache.hadoop.io.compress.GzipCodec but it's not work.

Does anyone know a way for compress output of Hadoop Archive, or suggest me someway to get both goal (compress size and reduce number of file).

Any infomation is appreciate. Thanks so much.

Upvotes: 2

Views: 544

Answers (1)

Praneeth Gudumasu
Praneeth Gudumasu

Reputation: 177

You may use mapred compress and run har on the compressed directories

Upvotes: 0

Related Questions