Reputation: 7661
I would like to know, how does the getMerge command work in OS/HDFS level. Will it copy each and every byte/blocks from one file to another file,or just a simple file descriptor change? How costliest operation is it?
Upvotes: 0
Views: 1037
Reputation: 191711
getmerge
Usage: hadoop fs -getmerge <src> <localdst> [addnl]
Takes a source directory and a destination file as input and concatenates files in src into the destination local file. Optionally addnl can be set to enable adding a newline character at the end of each file.
So, to answer your question,
Will it copy each and every byte/blocks from one file to another file
Yes, and no. It will find every HDFS block containing the files in the given source directory and concatenate them together into a single file on your local filesystem.
a simple file descriptor change
Not sure what you mean by that. getmerge
doesn't change any file descriptors; it is just reading data from HDFS to your local filesystem.
How costliest operation is it?
Expect it to be as costly as manually cat
-ing all the files in an HDFS directory. The same operation for
hadoop fs -getmerge /tmp/ /home/user/myfile
Could be achieved by doing
hadoop fs -cat /tmp/* > /home/user/myfile
The costly operation being the fetching of many file pointers and transferring those records over the network to your local disk.
Upvotes: 3