Reputation: 49
Could anybody give me an advice about how to merge a lot of small files from normal File System to a whole file in HDFS efficiently.
Upvotes: 0
Views: 137
Reputation: 373
In case your files exist on Linux you can try this command
cat *.txt > merge.log |cat merge.log|hadoop fs -put - mergedFile.log
Upvotes: 1
Reputation: 194
hadoop fs -getmerge <src> <localdst> [addnl]
-getmerge : Get all the files in the directories that match the source file pattern and merge and sort them to only one file on local fs. is kept.
example: hadoop fs -getmerge /user/hdfs/test/ /home/hdfs/Desktop/merge where :/user/hdfs/test/ is hdfs dir where files to be merged reside and /home/hdfs/Desktop/merge5 is local destination path where merge file will be copied.
Upvotes: 1
Reputation: 8664
You could consider the below techniques
This a common problem and you should be able to google up on it, this blog here should also give you some pointers
Let me know if you needed help with something more specific
Upvotes: 1