Nageswaran
Nageswaran

Reputation: 7661

How does getMerge work in Hadoop?

I would like to know, how does the getMerge command work in OS/HDFS level. Will it copy each and every byte/blocks from one file to another file,or just a simple file descriptor change? How costliest operation is it?

Upvotes: 0

Views: 1037

Answers (1)

OneCricketeer
OneCricketeer

Reputation: 191711

getmerge

Usage: hadoop fs -getmerge <src> <localdst> [addnl]

Takes a source directory and a destination file as input and concatenates files in src into the destination local file. Optionally addnl can be set to enable adding a newline character at the end of each file.

So, to answer your question,

Will it copy each and every byte/blocks from one file to another file

Yes, and no. It will find every HDFS block containing the files in the given source directory and concatenate them together into a single file on your local filesystem.

a simple file descriptor change

Not sure what you mean by that. getmerge doesn't change any file descriptors; it is just reading data from HDFS to your local filesystem.

How costliest operation is it?

Expect it to be as costly as manually cat-ing all the files in an HDFS directory. The same operation for

hadoop fs -getmerge /tmp/ /home/user/myfile

Could be achieved by doing

hadoop fs -cat /tmp/* > /home/user/myfile

The costly operation being the fetching of many file pointers and transferring those records over the network to your local disk.

Upvotes: 3

Related Questions