Raj Abhishek
Raj Abhishek

Reputation: 61

Hadoop merge files

I have ran a map only job with 674 mappers which hive took an has generated 674 .gz files I want to merge these files to aroung 30-35 files.have tried hive megre mapfilse property by not getting the merged output

Upvotes: 6

Views: 7538

Answers (1)

Ambrish
Ambrish

Reputation: 3677

Try using TEZ execution engine and then hive.merge.tezfiles. You might also want to specify the size as well.

set hive.execution.engine=tez; -- TEZ execution engine
set hive.merge.tezfiles=true; -- Notifying that merge step is required
set hive.merge.smallfiles.avgsize=128000000; --128MB
set hive.merge.size.per.task=128000000; -- 128MB

If you want to go with MR engine then add following settings (I haven't tried it personally)

set hive.merge.mapredfiles=true; -- Notifying that merge step is required
set hive.merge.smallfiles.avgsize=128000000; --128MB
set hive.merge.size.per.task=128000000; -- 128MB

Above setting will spawn one more step to merge the files and approx size of each part file should be 128MB.

Reference:

Upvotes: 13

Related Questions