How to Tune Hive Insert overwrite partition?

Question

I have written insert overwrite partition in hive to merge all the files in a partition into bigger file,

SQL:

SET hive.exec.compress.output=true;
set hive.merge.smallfiles.avgsize=2560000000;
set hive.merge.mapredfiles=true;
set hive.merge.mapfiles =true;
SET mapreduce.max.split.size=256000000;
SET mapreduce.min.split.size=256000000;
SET mapreduce.output.fileoutputformat.compress.type =BLOCK;
SET hive.hadoop.supports.splittable.combineinputformat=true;
SET mapreduce.output.fileoutputformat.compress.codec=${v_compression_codec};

INSERT OVERWRITE TABLE ${source_database}.${table_name} PARTITION (${line}) 
 SELECT ${prepare_sel_columns} 
 from ${source_database}.${table_name} 
 WHERE ${partition_where_clause};
"

With the above setting I am getting the compressed output but the time it takes to generate the output file is too long.

Even though it runs map only jobs , Takes much time.

Looking for any further setting from hive side to tune the Insert to run faster.

Metrics.

15 GB files ==> taking 10 min.

William R · Accepted Answer

SET hive.exec.compress.output=true;
SET mapreduce.input.fileinputformat.split.minsize=512000000; 
SET mapreduce.input.fileinputformat.split.maxsize=5120000000;
SET mapreduce.output.fileoutputformat.compress.type =BLOCK;
SET hive.hadoop.supports.splittable.combineinputformat=true;
SET mapreduce.output.fileoutputformat.compress.codec=${v_compression_codec};

The above setting helped lot , The duration came down from 10 min to 1 min.

How to Tune Hive Insert overwrite partition?

Answers (1)

Related Questions