Rossy
Rossy

Reputation: 65

Outputting hive table to HDFS as a single file

I'm trying to output the contents of a table I have in hive to hdfs as a single csv file, however when I run the code below it splits it into 5 separate files of ~500mb each. Am I missing something in terms of outputting the results as one single csv file?

set hive.execution.engine=tez;
set hive.merge.tezfiles=true;
INSERT OVERWRITE DIRECTORY  "/dl/folder_name"
row format delimited fields terminated by ','
select * from schema.mytable;

Upvotes: 0

Views: 966

Answers (1)

notNull
notNull

Reputation: 31540

Add orderby clause in your select query then Hive will force to run single reducer which will create only one file in HDFS directory.

INSERT OVERWRITE DIRECTORY  "/dl/folder_name"
row format delimited fields terminated by ','
select * from schema.mytable order by <col_name>;

Note:

If the number of rows in the output is too large, the single reducer could take a very long time to finish.

Upvotes: 1

Related Questions