Reputation: 65
I'm trying to output the contents of a table I have in hive to hdfs as a single csv file, however when I run the code below it splits it into 5 separate files of ~500mb each. Am I missing something in terms of outputting the results as one single csv file?
set hive.execution.engine=tez;
set hive.merge.tezfiles=true;
INSERT OVERWRITE DIRECTORY "/dl/folder_name"
row format delimited fields terminated by ','
select * from schema.mytable;
Upvotes: 0
Views: 966
Reputation: 31540
Add orderby
clause in your select query then Hive will force to run single reducer
which will create only one file in HDFS directory.
INSERT OVERWRITE DIRECTORY "/dl/folder_name"
row format delimited fields terminated by ','
select * from schema.mytable order by <col_name>;
Note:
If the number of rows in the output is too large, the single reducer
could take a very long time to finish.
Upvotes: 1