Reputation: 1008
I am executing a hive INSERT OVERWRITE query.
SET hive.merge.mapredfiles=false;
SET mapred.reduce.tasks=${hiveconf:NUM_REDUCERS};
SET hive.mapred.supports.subdirectories=TRUE;
SET mapred.input.dir.recursive=TRUE;
SET hive.execution.engine=tez;
SET hive.tez.container.size=6144;
INSERT OVERWRITE TABLE CS.STAGING_AVRO PARTITION (CPD_DT='${hiveconf:CPD_DT}',CUT_NO='${hiveconf:CUT_NO}')
select DISTINCT from CS_TRAN.field1, CS_TRAN.field2 ....
from ( SELECT A, B, C, ... FROM NMS1.TBL1 where ... )CS_TRAN
join NMS2.TBL2 where ....
This insert overwrite query joins data from two tables and writes that into the CS.STAGING_AVRO table in partitions by date and cut
the issue that I am facing here is that in 1 partition it is creating 1 big file
For example:
In all the partitions it is creating 1 file of 3gb size. I want to split this 1 large file into many files of 500mb size.
This I am not able to achieve.
Configuration that I am currently using:
SET hive.merge.mapredfiles=false;
SET mapred.reduce.tasks=${hiveconf:NUM_REDUCERS};
SET hive.mapred.supports.subdirectories=TRUE;
SET mapred.input.dir.recursive=TRUE;
SET hive.execution.engine=tez;
SET hive.tez.container.size=6144;
I tried the following things but it didn't work:
SET mapred.max.split.size=1000000;
SET hive.exec.reducers.bytes.per.reducer=1000000;
SET hive.merge.size.per.task=1000000;
SET hive.merge.smallfiles.avgsize=1000000;
SET hive.exec.compress.output=true;
Please tell me what needs to be done for the same. Any help is appreciated.
Upvotes: 1
Views: 274
Reputation: 7387
Try this
SET dfs.block.size=536870912;
SET parquet.block.size=536870912;
They will set block size to 512MB. If you want to increase it to 1GB just multiply by 2. But i am afraid you have to recreate the partitioned table, otherwise hive wont create smaller files.
Upvotes: 1