Reputation: 49
I am learning hadoop. I came across a problem now. I ran the mapreduce job and output was stored in multiple files but not as single file. I want to append all of them into a single file in hdfs. I know about appendToFile and getmerge commands. But they work only for either local file system to hdfs
or hdfs to local system
but not from HDFS to HDFS
. Is there any way to append the output files in HDFS to a single file in HDFS without touching local file system?
Upvotes: 0
Views: 842
Reputation: 191701
The only way to do this would be to force your mapreduce code to use one reducer, for example, by sorting all the results by a single key.
However, this defeats the purpose of having a distributed filesystem and multiple processors. All Hadoop jobs should be able to read a directory of files, not isolated to process a single file
If you need a single file to download from HDFS, then you should use getmerge
Upvotes: 1
Reputation: 452
There is no easy way to do this directly in HDFS. But the below trick works. Although not a feasible solution, but should work if output is not huge.
hadoop fs -cat source_folder_path/* | hadoop fs -put target_filename
Upvotes: 0