Reputation: 49

How can I append multiple files in HDFS to a single file in HDFS without the help of local file system?

I am learning hadoop. I came across a problem now. I ran the mapreduce job and output was stored in multiple files but not as single file. I want to append all of them into a single file in hdfs. I know about appendToFile and getmerge commands. But they work only for either local file system to hdfsor hdfs to local system but not from HDFS to HDFS. Is there any way to append the output files in HDFS to a single file in HDFS without touching local file system?

Upvotes: 0

Answers (2)

OneCricketeer

Reputation: 191983

The only way to do this would be to force your mapreduce code to use one reducer, for example, by sorting all the results by a single key.

However, this defeats the purpose of having a distributed filesystem and multiple processors. All Hadoop jobs should be able to read a directory of files, not isolated to process a single file

If you need a single file to download from HDFS, then you should use getmerge

Upvotes: 1

Shashwath

Reputation: 452

There is no easy way to do this directly in HDFS. But the below trick works. Although not a feasible solution, but should work if output is not huge.

hadoop fs -cat source_folder_path/* | hadoop fs -put target_filename

Upvotes: 0

How can I append multiple files in HDFS to a single file in HDFS without the help of local file system?

Answers (2)

Related Questions