Outputting hive table to HDFS as a single file

Question

I'm trying to output the contents of a table I have in hive to hdfs as a single csv file, however when I run the code below it splits it into 5 separate files of ~500mb each. Am I missing something in terms of outputting the results as one single csv file?

set hive.execution.engine=tez;
set hive.merge.tezfiles=true;
INSERT OVERWRITE DIRECTORY  "/dl/folder_name"
row format delimited fields terminated by ','
select * from schema.mytable;

notNull · Accepted Answer

Add orderby clause in your select query then Hive will force to run single reducer which will create only one file in HDFS directory.

INSERT OVERWRITE DIRECTORY  "/dl/folder_name"
row format delimited fields terminated by ','
select * from schema.mytable order by ;

Note:

If the number of rows in the output is too large, the single reducer could take a very long time to finish.

Outputting hive table to HDFS as a single file

Answers (1)

Related Questions