William R
William R

Reputation: 739

How to Tune Hive Insert overwrite partition?

I have written insert overwrite partition in hive to merge all the files in a partition into bigger file,

SQL:

SET hive.exec.compress.output=true;
set hive.merge.smallfiles.avgsize=2560000000;
set hive.merge.mapredfiles=true;
set hive.merge.mapfiles =true;
SET mapreduce.max.split.size=256000000;
SET mapreduce.min.split.size=256000000;
SET mapreduce.output.fileoutputformat.compress.type =BLOCK;
SET hive.hadoop.supports.splittable.combineinputformat=true;
SET mapreduce.output.fileoutputformat.compress.codec=${v_compression_codec};

INSERT OVERWRITE TABLE ${source_database}.${table_name} PARTITION (${line}) \n SELECT ${prepare_sel_columns} \n from ${source_database}.${table_name} \n WHERE ${partition_where_clause};\n" 

With the above setting I am getting the compressed output but the time it takes to generate the output file is too long.

Even though it runs map only jobs , Takes much time.

Looking for any further setting from hive side to tune the Insert to run faster.

Metrics.

15 GB files ==> taking 10 min.

Upvotes: 2

Views: 2263

Answers (1)

William R
William R

Reputation: 739

SET hive.exec.compress.output=true;
SET mapreduce.input.fileinputformat.split.minsize=512000000; 
SET mapreduce.input.fileinputformat.split.maxsize=5120000000;
SET mapreduce.output.fileoutputformat.compress.type =BLOCK;
SET hive.hadoop.supports.splittable.combineinputformat=true;
SET mapreduce.output.fileoutputformat.compress.codec=${v_compression_codec};

The above setting helped lot , The duration came down from 10 min to 1 min.

Upvotes: 1

Related Questions